niemasd / ViralConsensus

Fast viral consensus genome reconstruction
https://niema.net/ViralConsensus/
GNU General Public License v3.0
22 stars 2 forks source link

Fix CRAM support (provide ref genome to htslib file open) #6

Closed niemasd closed 1 year ago

niemasd commented 1 year ago

Right now, when ViralConsensus receives a CRAM file, instead of opening it with the user-provided local reference genome file, it seems to be pulling the MD5 hash from the CRAM header and pulling the CRAM from EBI. I need to fix this such that it uses the user-provided FASTA file: the current behavior works fine (but not ideal) if the user has network connection, but it'll break if the user is offline / can't connect to EBI

This is the line where I'm opening the input SAM/BAM/CRAM file:

https://github.com/niemasd/ViralConsensus/blob/a8df409692d1cd8d43a2ffcf5d2ce64166da1725/count.cpp#L63

I think I need to figure out how to tell hts_open where the local reference FASTA exists

niemasd commented 1 year ago

This seems to be the exact issue we have: https://github.com/samtools/htslib/issues/1015

niemasd commented 1 year ago

Maybe I can use this?

https://github.com/samtools/htslib/blob/f4a3b994be8f7904caeb5d58eaa14952ad39f2ea/cram/cram_io.c#L3591

niemasd commented 1 year ago

Maybe via the code example here?

https://github.com/samtools/samtools/issues/145#issuecomment-35069219

niemasd commented 1 year ago

Attempted fix in: https://github.com/niemasd/ViralConsensus/commit/0b99fcade0d3a6769a27819ee92bf77d6ebd2893

@daniel-ji @robertaboukhalil Can you see if this commit fixed the BioWASM ViralConsensus CRAM support issue? I wasn't able to reproduce the CRAM issue in ViralConsensus v0.0.2 on my local machine even with internet disabled (it ran perfectly fine without crashing), so I don't know if this actually fixed it

niemasd commented 1 year ago

Seems like it's fixed! Incorporated into ViralConsensus v0.0.3