gymrek-lab / LongTR

Tandem repeat genotyping with long reads
GNU General Public License v2.0
18 stars 0 forks source link

Feature request: Support remote cram files #10

Open wdecoster opened 3 weeks ago

wdecoster commented 3 weeks ago

Hi,

Would you be interested in supporting genotyping from remote cram files? I noticed it only requires to comment out two lines at https://github.com/gymrek-lab/LongTR/blob/d9323818eea55cbf55ac72ee7992c6b901a25bdc/src/bam_io.cpp#L70

Maybe a --remote flag that disables checking if the file exists? Happy to try a PR, but I'm not a C++ programmer.

Thanks, Wouter

heliziii commented 2 weeks ago

Hi Wouter,

I apologize for my delayed response. We try to do a minimum check for local files to inform users if their input address is wrong, but I added exceptions here for remote files:

https://github.com/gymrek-lab/LongTR/blob/d9323818eea55cbf55ac72ee7992c6b901a25bdc/src/bam_io.h#L465

May I ask where your file is stored? I can add more exceptions to the list.

Apologies again for the delay.

Best, Helia

wdecoster commented 2 weeks ago

Hi Helia,

Thank you!

The files of interest are available over https and s3: e.g.: https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1KG_ONT_VIENNA/hg38/HG00096.hg38.cram or https://s3.amazonaws.com/1000g-ont/ALIGNMENT_AND_ASSEMBLY_DATA/100_PLUS/IN-HOUSE_MINIMAP2/HG00097-ONT-hg38-R10-LSK114-dorado034_sup_5mCG_5hmCG_v33/HG00097-ONT-hg38-R10-LSK114-dorado034_sup_5mCG_5hmCG_v33.phased.bam

Wouter

heliziii commented 2 weeks ago

Hi again,

This should be fixed now, can you please pull the latest version and see if LongTR works on your cram files on FTP?

Best, Helia