Open SOFIIITA opened 2 years ago
Did you try following our instructions for customizing blat here, and if so what problems did you encounter?
yes i have tried but i can´t get the correct URL for pichia pastoris.
Aha! This is one our new bulk genome browser (aka Genark). We have one big Blat server that serves thousands of genomes. IGV could support this … it’s a fixed blat server name.
On Thu 20 Oct 2022 at 20:27, SOFIIITA @.***> wrote:
yes i have tried but i can´t get the correct URL for pichia pastoris.
— Reply to this email directly, view it on GitHub https://github.com/igvteam/igv/issues/1239#issuecomment-1285970261, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TJJMDMOVRL62LFAZEDWEGFIHANCNFSM6AAAAAARKMJ2YY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@maximilianh Thanks for responding, what is the URL to the blat server? IGV uses this currently, which is user settable.
https://genome.ucsc.edu/cgi-bin/hgBlat?userSeq=$SEQUENCE&type=DNA&db=$DB&output=json
The new blat server is called dynablat and can be found in the hub.txt -> genomes.txt file that the user referenced. The link that the user sent leads to a genome browser session and when you click there on "my data > my hubs" you can find under "connected hubs" an entry for this hub and the first link to it points to the hub.txt which points to this genomes.txt which has the BLAT information: https://hgdownload.soe.ucsc.edu/hubs/fungi/genomes.txt
In short: most GCF and a few GCA assemblies are served from this blat server: https://hgdownload.soe.ucsc.edu/hubs/fungi/genomes.txt
These are the ones we currently support but every day we add 1-2 more genomes: https://hgdownload.soe.ucsc.edu/hubs/UCSC_GI.assemblyHubList.txt
All described here: https://hgdownload.soe.ucsc.edu/hubs/
On Fri, Oct 21, 2022 at 12:36 AM Jim Robinson @.***> wrote:
@maximilianh https://github.com/maximilianh Thanks for responding, what is the URL to the blat server? IGV uses this currently, which is user settable.
https://genome.ucsc.edu/cgi-bin/hgBlat?userSeq=$SEQUENCE&type=DNA&db=$DB&output=json
— Reply to this email directly, view it on GitHub https://github.com/igvteam/igv/issues/1239#issuecomment-1286233914, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TOHIFWXXULDX2BS2C3WEHCM7ANCNFSM6AAAAAARKMJ2YY . You are receiving this because you were mentioned.Message ID: @.***>
Thanks @maximilianh , helpful as always, but its still not clear to me how to construct a webservice URL to do the blat. The "blat" property is not described here https://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp. I found some information here https://genome.ucsc.edu/goldenPath/help/hubQuickStartAssembly.html but it seems to be instructions for setting up blat for use with an instance of the UCSC Genome Browser, specifically use with "the blat page". I don't see anything that looks like a blat webservice.
Concretely, I see this entry in genomes.txt
blat dynablat-01.soe.ucsc.edu 4040 dynamic GCF/000/027/005/GCF_000027005.1
How does one construct a URL to do a blat from this information? For "traditional" assemblies I use a URL of this form, where $SEQUENCE is the user sequence and $DB the genome ID (e.g. hg38).
https://genome.ucsc.edu/cgi-bin/hgBlat?userSeq=$SEQUENCE&type=DNA&db=$DB&output=json
I tried this as a guess, request times out
https://dynablat-01.soe.ucsc.edu:4040/GCF/000/027/005/GCF_000027005.1?userSeq=AAATAGGGACCTAGTTGTTGTTTGATAATTTTTTCTGCTGTATAGATATAATACGCCATGCGGTAGTAAA&type=DNA&output=json
Hi @maximilianh , if its not possible to do this could you let me know and I'll close this ticket? By "this" I mean call the dynablat server at dynablat-01.soe.ucsc.edu as a webservice, as we do https://genome.ucsc.edu/cgi-bin/hgBlat for human and other genomes. Thanks.
Hi @jrobinso sorry I had missed your last note (was traveling).
I misunderstood your question above at first. To run hgBlat, you should be able to use the NCBI accession just like a UCSC db parameter on our website everywhere now. If that doesn't work somewhere, let us know and we will fix it.
So e.g. for BLAT, you should be able to run queries here as before, this is working for me:
@maximilianh brilliant, thanks, should have tried that.
@SOFIIITA so for you case, where you are in effect using a "custom genome" or fasta, you can change the blat URL in the advanced preferences to the following
https://genome.ucsc.edu/cgi-bin/hgBlat?userSeq=$SEQUENCE&type=DNA&db=GCF_000027005.1&output=json
@SOFIIITA https://github.com/SOFIIITA and if the BLAT still doesn't work, we may have to add the genome in question. This one works, and a few thousand others, but not all genomes yet. You can send a quick email to @.***, we can usually add them within a few hours.
Jim, maybe you or us should document this somewhere... also maybe we should provide IGV configuration files so that people can open all these assemblies directly without having to configure them themselves? Do you have a format where people can just double click or open in some way ?
On Tue, Nov 8, 2022 at 6:58 PM Jim Robinson @.***> wrote:
@SOFIIITA https://github.com/SOFIIITA so for you case, where you are in effect using a "custom genome" or fasta, you can change the blat URL in the advanced preferences to the following
https://genome.ucsc.edu/cgi-bin/hgBlat?userSeq=$SEQUENCE&type=DNA&db=GCF_000027005.1&output=json
— Reply to this email directly, view it on GitHub https://github.com/igvteam/igv/issues/1239#issuecomment-1307616621, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TO52H25AJXVTZBAJJTWHKIELANCNFSM6AAAAAARKMJ2YY . You are receiving this because you were mentioned.Message ID: @.***>
@maximilianh Yes I didn't realize this was so easy, I need to take make some changes to take advantage of this. For Blat specifically I need to do 2 things, (1) open blat for all genomes using NCBI accession number, currently it filters out genomes that do not have a UCSC genome DB id, (2) insure that all our genome configuration files have the correct identifier for blat.
In the meantime I will update this page so users can't do this manually in their IGV preferences for specific assemblies: https://github.com/igvteam/igv/wiki/Customizing-BLAT
For the more general goal of being able to just open any genome that has a track hub, that is complex but I am making progress slowly. IGV was modeled after UCSC for the most part but there are enough differences conceptually that I still need to make manual tweaks to convert the various track hub files to an IGV genome json definition. This conversion is currently done offline in python, Colab notebooks mostly, with manual tweaking steps The goal is to remove the need for manual tweaking, but I'm not there yet.
Should we create one JSON file per assembly hub? Once you have a script that we can run, we can totally do that.
On Tue, Nov 8, 2022 at 7:22 PM Jim Robinson @.***> wrote:
@maximilianh https://github.com/maximilianh Yes I didn't realize this was so easy, I need to take make some changes to take advantage of this. For Blat specifically I need to do 2 things, (1) open blat for all genomes using NCBI accession number, currently it filters out genomes that do not have a UCSC genome DB id, (2) insure that all our genome configuration files have the correct identifier for blat.
In the meantime I will update this page so users can't do this manually in their IGV preferences for specific assemblies: https://github.com/igvteam/igv/wiki/Customizing-BLAT
For the more general goal of being able to just open any genome that has a track hub, that is complex but I am making progress slowly. IGV was modeled after UCSC for the most part but there are enough differences conceptually that I still need to make manual tweaks to convert the various track hub files to an IGV genome json definition. This conversion is currently done offline in python, Colab notebooks mostly, with manual tweaking steps The goal is to remove the need for manual tweaking, but I'm not there yet.
— Reply to this email directly, view it on GitHub https://github.com/igvteam/igv/issues/1239#issuecomment-1307649935, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TIEEPKQDAXKPU25OTDWHKK5TANCNFSM6AAAAAARKMJ2YY . You are receiving this because you were mentioned.Message ID: @.***>
@maximilianh Currently, for IGV desktop the assembly and associated trackHubs file, etc, turns into a minimal igv genome json file, and a collection of files for the IGV tracks menu including some XML. This is not optimal, the next step for me is to combine all this into a single json file. When I have something that does that I would value your input on it.
coincidentally I was just working on this hub for another user request: https://hgdownload.gi.ucsc.edu/hubs//GCA/011/100/615/GCA_011100615.1/.
If you could allow our .ra text format as an alternative instead of json that would mean that we don’t have to create a separate set of files.
.ra can be translated 1:1 to a list of json dictionaries.
The hub.txt and genomes.txt and trackDb.txt files are all in .ra format. Also, one can combine them all into a single hub.txt file. I can provide minimal examples.
On Tue 8 Nov 2022 at 20:03, Jim Robinson @.***> wrote:
@maximilianh https://github.com/maximilianh Currently, for IGV desktop the assembly and associated trackHubs file, etc, turns into a minimal igv genome json file, and a collection of files for the IGV tracks menu including some XML. This is not optimal, the next step for me is to combine all this into a single json file. When I have something that does that I would value your input on it.
coincidentally I was just working on this hub for another user request: https://hgdownload.gi.ucsc.edu/hubs//GCA/011/100/615/GCA_011100615.1/.
— Reply to this email directly, view it on GitHub https://github.com/igvteam/igv/issues/1239#issuecomment-1307694157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TKG47T2JB7DL6EYPFTWHKPZHANCNFSM6AAAAAARKMJ2YY . You are receiving this because you were mentioned.Message ID: @.***>
@maximilianh My goal is to work off what you have, specifically these https://hgdownload.gi.ucsc.edu/hubs/ . I already can parse these, json is certainly not required. Most of the work left is to eliminate the manual steps I do, but there is no problem with the formats of these files. For example, I need to add support for .2bit format in IGV directly so I don't need to convert to fasta and index it. I will get there eventually, appreciate all your help.
@maximilianh BTW, on the subject of 2bit what is the "2bit.bpt" file? I've tried searching but haven't come up with a definition.
twoBitBptUrl ../GCA/004/027/145/GCA_004027145.1/GCA_004027145.1.2bit.bpt
OK, great. Well, let us know if we can do something to help. I know that for 2bit a Java parser already exists. If you need additional files or know more about the structure or if we can add additional indexes in some form, let me know. If you find weird things in hub.txt file, I'm interested, too.
On Tue, Nov 8, 2022 at 8:46 PM Jim Robinson @.***> wrote:
@maximilianh https://github.com/maximilianh My goal is to work off what you have, specifically these https://hgdownload.gi.ucsc.edu/hubs/ . I already can parse these, json is certainly not required. Most of the work left is to eliminate the manual steps I do, but there is no problem with the formats of these files. For example, I need to add support for .2bit format in IGV directly so I don't need to convert to fasta and index it. I will get there eventually, appreciate all your help.
— Reply to this email directly, view it on GitHub https://github.com/igvteam/igv/issues/1239#issuecomment-1307745240, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TIVIT6PQHLDOJCYWSLWHKU2DANCNFSM6AAAAAARKMJ2YY . You are receiving this because you were mentioned.Message ID: @.***>
Hello, I want to search a sequence in my alignment with a reference genome but an error appears every time with the reference genome of pichia pastoris yeast, it says not found in blat server.
Our reference genome is in BLAT server, we require the pichia pastoris (komagatella paffi) reference genome https://genome.ucsc.edu/cgi-bin/hgGateway?hgsid=1477309087_46WfLHqRloEjArEvaGCmHJOsqPKg But we can´t load it in igv to use the blat server to find a sequence in our alignment.