igvteam / igv

Integrative Genomics Viewer. Fast, efficient, scalable visualization tool for genomics data and annotations
https://igv.org
MIT License
646 stars 387 forks source link

Support specifying file format for URLS ("Tracks from presigned URL *without* the file extension") #1526

Closed SergejN closed 5 months ago

SergejN commented 5 months ago

Dear IGV team,

for reasons I have no influence on, the BAM/BAI files are stored in the s3 bucket without extensions. I can generate presigned URLs to load the files in IGV, but it raises an error because it cannot guess the file type.

"Cannot determine file type of: https://***.amazonaws.com/data/without_extension_bam?X-Amz......."

Is there a way to explicitly force IGV to interpret the file as BAM/BAI?

It's not a bug as I wouldn't know how to handle files without extension, either, unless told explicitly, but rather an idea.

thank you!

jrobinso commented 5 months ago

Currently there is not a way without a file extension. Any suggestions?

SergejN commented 5 months ago

Not a super elegant one - prefix the URL with the filetype, e.g. bam:https://my_s3_bucket/my_file_with_no_extension?additional-aws-param=..... An alternative would be to add a dropdown menu to the dialog and allow the user to specify the data type

jrobinso commented 5 months ago

Another idea is a url paramter, e.g. format=bam. This would not actually get passed to s3 just parsed by IGV

SergejN commented 5 months ago

That's true, we also thought about that because we were inspired by an old google groups thread https://groups.google.com/g/igv-help/c/2xsxhvHDDa0?pli=1 . We are currently checking whether it is possible to have an additional parameter in the AWS presigned URL.

jrobinso commented 5 months ago

Again, it is not neccessary that AWS see the additional parameter, IGV could just parse and remove it before the requests.

However, the dropdown menu option is not a bad one. It might be the simplest to implement.

jrobinso commented 5 months ago

There are a few formats, BAM being one, that can be determined from the file itself as they define required magic number bits at the front of the file. I'm going to look into implementing auto detection for these formats if it cannot otherwise determined. This should work for bam, cram, bigwig/bigbed, vcf, and sometimes gff.

jrobinso commented 5 months ago

I think this should be fixed, and you shouldn't need to do anything. As outlined above, the file type will be determined from required header bits for those formats that require them, which includes BAM, if no file extension is present. This change is available in the snapshot build, if you are able to test this that would be helpful. The snapshot build can be downloaded from https://igv.org/doc/desktop/#DownloadSnapshot/.