Closed malcook closed 2 years ago
I'm not sure what you are suggesting. Obviously we can't have a general rule that if column 7 is present in a bigbed file that it is interpreted as a name.
I agree it is probably a "bad idea"(tm).
But...
Would you then conclude with me that the agreement between the wassermanlab and UCSC that "the bigbeds contain the TF name as an extra field" was a bad idea insofar as it these files do not comport to bigbed spec (despite bigBedToBed
happily rematerializing them, as below).
bigBedToBed -chrom=chr1 -start=10001 -end=10005 http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/JASPAR2022_hg19.bb /dev/stdout
chr1 10001 10018 MA0883.1 328 - Dmbx1
chr1 10003 10013 MA0599.1 239 + KLF5
chr1 10003 10015 MA0712.2 275 - OTX2
chr1 10004 10013 MA0714.1 268 + PITX3
chr1 10004 10014 MA0467.2 314 - Crx
chr1 10004 10014 MA0891.1 265 + GSC2
chr1 10004 10019 MA1574.1 341 - THRB
FWIW: Consistent with their use of this arguably non-conforming bigbed format, UCSC track configuration provides choice to display 'TF Name' (col 7) instead of MatrixID (col 4) only for 2022 version of this resource, as can be seen in this screenshot:
I guess I was suggesting that IGV follow suit somehow, but I understand if you close issue as not really being IGV's.
I don't know that there is a "spec" for bed files, after the first 3 columns anything goes. It makes it somewhat challenging. In some contexts I think this bed file would be referred to as "bed6+" as the first 6 columns are standard.
This is a custom UCSC track, in general I don't have the resources, in the parlance of our times, to do custom tracks and we don't host this file in any event.
Its possible we could so something for this problem using the autoSQL, the solution would be (probably) to add a menu item the user could use to choose available columns for name (in this case they would choose TFNAME). If you don't mind we can leave this open and I'll rename it accordingly.
There's actually a very recent, official, spec for BED files. It was merged a month ago: https://github.com/samtools/hts-specs/pull/570 ;)
@brainstorm ok, great, might be helpful in the future. I don't see how it helps with this situation, however. In fact the document says that it does not specify a means of identifying the contents of columns 4-12. This information must be supplied "out-of-band". These are the columns I am referring to when I say there's not really a spec, they are not nailed down and you have to know what they mean by other means.
Some information about a BED file can only be supplied unambiguously separately from the data
lines of the BED file. This specification does not contain a means for interchanging this information.
Information that must be supplied out-of-band include:
• Which of the first 4 to 12 fields are standard BED fields and which are custom fields.
Hi,
Hmm, the Wasserman lab made this bed file and it’s entirely compatible with the bigBed spec, there is nothing different than for other bigBed files.
Why is it a “bad” idea to store the TF name in an extra field ?
On Wed 19 Jan 2022 at 00:31, Jim Robinson @.***> wrote:
@brainstorm https://github.com/brainstorm ok, great, might be helpful in the future. I don't see how it helps with this situation, however. In fact the document says that it does not specify a means of identifying the contents of columns 4-12. This information must be supplied "out-of-band". These are the columns I am referring to when I say there's not really a spec, they are not nailed down and you have to know what they mean by other means.
Some information about a BED file can only be supplied unambiguously separately from the data
lines of the BED file. This specification does not contain a means for interchanging this information.
Information that must be supplied out-of-band include:
• Which of the first 4 to 12 fields are standard BED fields and which are custom fields.
— Reply to this email directly, view it on GitHub https://github.com/igvteam/igv/issues/1089#issuecomment-1015928483, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TLXZC5Y3BQ64HIKDJDUWXZ6FANCNFSM5MH5W2TQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
@maximilianh I don't think it is a bad idea, and I agree its entirely compatible. The issue here is IGV uses the "name" field (column 4) as a label, and @malcook would prefer column 7 for this particular bigBed file. I renamed this issue to suggest the autoSQL might be useful for IGV to present a choice of fields to the user to use for the label. I will look into this possibility when I have time, thus leave the issue open. This is not a bigBed or UCSC issue, sorry for any confusion.
the solution would be (probably) to add a menu item the user could use to choose available columns for name (in this case they would choose TFNAME). If you don't mind we can leave this open and I'll rename it accordingly
who could ask for anything more?
I don't know that there is a "spec" for bed files,
Referring to the samtools BedV1 specification, I see now that the wassermanlab's files might be thought of as "bed6+1" with a single custom field.
I had been looking at https://genome.ucsc.edu/FAQ/FAQformat.html#format1 which purports to define the range for each column and does not refer to custom fields.
@malcook Understood, in practice we deal with the files as they exist. I think the autoSql might be helpful here.
I don't know the exact reason why the label field was changed, but this is not the only track where we did it like this. The labelField has been a valid trackDb statement for many years.
The trackDb of this track looks like this:
track jaspar compositeTrack on shortLabel JASPAR Transcription Factors longLabel JASPAR Transcription Factor Binding Site Database group regulation visibility hide type bigBed 6 . pennantIcon New red ../goldenPath/newsarch.html#010522 "Released Jan. 6, 2022" url http://jaspar.genereg.net/search?q=$$&collection=all&tax_group=all&tax_id=all&type=all&class=all&family=all&version=all urlLabel View on JASPAR: filter.score 400 filterByRange.score 0:1000 maxItems 100000 maxWindowCoverage 50000 exonArrows on spectrum on
track jaspar2022
parent jaspar on
shortLabel JASPAR 2022 TFBS
longLabel JASPAR CORE 2022 - Predicted Transcription Factor Binding
Sites priority 1 type bigBed 6 + visibility pack motifPwmTable hgFixed.jasparCore2022 labelFields TFName bigDataUrl /gbdb/$D/jaspar/JASPAR2022.bb
You can see that for this particular track, the field that is used for labeling is not "name" anymore but "TFName". This is not unusual. I guess the problem is that IGV doesn't read our trackDb, but if it's not doing that, then IGV will not be able to display the majority of our tracks as we show them, so this problem is not specific to this particular JASPAR file.
On Wed, Jan 19, 2022 at 3:48 AM Jim Robinson @.***> wrote:
@malcook https://github.com/malcook Understood, in practice we deal with the files as they exist. I think the autoSql might be helpful here.
— Reply to this email directly, view it on GitHub https://github.com/igvteam/igv/issues/1089#issuecomment-1016027722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TMYTEWESJKC3GC4H3TUWYRAPANCNFSM5MH5W2TQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
From IGV's perspective this is just a bigBed file, so no the trackDB is not read and I'm not even sure how it could be.
@maximilianh @malcook Perhaps a general fix this this problem, which maybe you are suggesting, would be to support loading from a track hub rather than directly from the bigBed. Of course loading directly from bigBed will always be supported.
the trackDB is not read and I'm not even sure how it could be
Hmm. Does it seem like I suggested it could? Did you mean to direct this comment to @maximilianh ?
@malcook yes (meant for maximilianh).
Note to self:
bigBedInfo -as http://expdata.cmmt.ubc.ca/JASPAR/downloads/UCSC_tracks/2022/JASPAR2022_hg19.bb
version: 4
fieldCount: 7
hasHeaderExtension: yes
isCompressed: yes
isSwapped: 0
extraIndexCount: 0
itemCount: 12,473,778,656
primaryDataSize: 119,887,888,128
primaryIndexSize: 782,301,588
zoomLevels: 10
chromCount: 93
as:
table JASPAR_TFBS
"TFBS predictions for profiles in the JASPAR CORE collections"
(
string chrom; "Reference sequence chromosome or scaffold"
uint chromStart; "Start position of feature on chromosome"
uint chromEnd; "End position of feature on chromosome"
string name; "Matrix ID"
uint score; "Score"
char[1] strand; "+ or - for strand"
string TFName; "TF name"
)basesCovered: 2,897,225,363
meanDepth (of bases covered): 46.102859
minDepth: 1.000000
maxDepth: 993.000000
std of depth: 43.105940
Track hubs are supported by Ensembl, NCBI and UCSC. So yes, it would be great if IGV had some support for track hubs. A basic version could be very minimal, shortLabel and longLabel, visibility and type are the most important keywords.
@maximilianh I will do this, although IGV is not in the same class as the big server based browsers you mention it is certainly worth doing. As a quick fix for JASPAR I'm thinking of just defining a "hosted" track in IGV for at least human and mouse assemblies using the basic data from the trackDB. I will not copy those 100GB bb files rather reference them. Anyway thanks for the tips and help as always.
Let us know if we can help with something. The trackDb specs are sometimes not documented well (e.g. genomes and hub.txt). It would be nice to implement useOneFile, I find it very useful, it packs the three files into a single file.
https://genome.ucsc.edu/goldenPath/help/hubQuickStart.html
On Thu, Jan 20, 2022 at 4:35 AM Jim Robinson @.***> wrote:
@maximilianh https://github.com/maximilianh I will do this, although IGV is not in the same class as the big server based browsers you mention it is certainly worth doing. As a quick fix for JASPAR I'm thinking of just defining a "hosted" track in IGV for at least human and mouse assemblies using the basic data from the trackDB. I will not copy those 100GB bb files rather reference them. Anyway thanks for the tips and help as always.
— Reply to this email directly, view it on GitHub https://github.com/igvteam/igv/issues/1089#issuecomment-1017088612, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TJO4MA7233YZUPQGQDUW57JBANCNFSM5MH5W2TQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
Thanks @maximilianh . RE "useOneFile", that would be the decision of the track hub creator, correct? I will support it where its available.
defining a "hosted" track in IGV for at least human and mouse @jrobinso - could you please include zebrafish in any short term patch solution - that is the use case the drove my initial request
@jrobinso - I'm still hoping somehow to be able to display as glyph label in IGV the bigbed's column 6 (TFName). Any chance of providing such functionality, possibly as a "workaround", in the near term (preferably not requiring reference to remote track hubs)?
@malcook A workaround would be to convert that file to a standard 12 column bed with the name you want in the standard name column. You can do this with a simple script.
snapshot looks good in my hands. Thanks so much!
@malcook I assume you found the "set label field" menu item.
Due to an change in the design of the bigbed files (as discussed in wassermanlab/JASPAR-UCSC-tracks#11) , the Matrix ID is displayed as the label on the glyph when loaded in the IGV browser.
eg: https://user-images.githubusercontent.com/484282/148572447-ecbdbed0-b798-4bc5-824b-122608323bfe.png
This display is less useful to most end users than displaying the TF name.
The TF name is now present in column 7 of the underlying bed file instead of column 4 (as before).
UCSC genome browser accommodates the change by continuing to display the TF name.
IGV does not.
Arguably IGV could be improved by displaying column 7 value if present, otherwise displaying the name column (4).
(Note: I brought this up as tangentially to https://github.com/igvteam/igv/issues/1085#issuecomment-1007533381 which was resolved without addressing this tangent, so I thought I'd give it its own issue...)
(Note: A workaround could be to reformat the bigbed to use IGV's neat ability to display GFF column 9 formatted attribute value pairs when they appear in column 4, however you might agree it would be advantageous to use the bigbeds as produced by wassermanlab).