mourisl / T1K

T1K is a versatile methods to genotype highly polymorphic genes (e.g. KIR, HLA) with bulk or single-cell RNA-seq, WGS or WES data.
MIT License
42 stars 7 forks source link

Support for cram files? #6

Open moutsian opened 1 year ago

moutsian commented 1 year ago

Hi,

This is a question rather than an issue. I was wondering if there is current support for cram files as input (instead of bams) or whether this is to come in a subsequent version?

Cheers, Loukas

mourisl commented 1 year ago

The installation of htslib is a bit hassle, so we don't have it directly in the package. But, after cloning T1K package, you can manually download the htslib-1.15.1 to the folder. Then you can run "htslib=1 make". Then, T1K can work with CRAM file.

If you want to use another htslib version or change the htslib path, you can modify the Makefile and the header of the "alignments.hpp".

moutsian commented 1 year ago

Thank you for your reply. Would it then run with the same command as for a bam file, e.g. run-t1k --preset hla-wgs \ -c /path/to/hla_dna_coord.fa -f /path/to/hla_dna_seq.fa -b /path/to/genome.cram --od /path/to/output \ --alleleDigitUnits 2 --alleleDelimiter ":" or does the bam-extractor function need to be modified?

Cheers, Loukas

mourisl commented 1 year ago

Yes, the command would be the same as running with a bam file.

alicekiki commented 1 year ago

Does it require any particulr command? I don't see any change if only downloading this package and run the command.

mourisl commented 1 year ago

@alicekiki It does not require particular command options. All you need to do is to compile T1K with htslib as instructed above.

alicekiki commented 1 year ago

Thank you! It worked out.

Bondada20 commented 1 month ago

@mourisl, could you please help me out? I am encountering some errors while using a .cram file as an import. The .cram file I am using is from the 1000 Genomes Project. The .bam files are too large, and if possible, I would like to use the .cram files to run t1k.

Here is the code I ran: ./run-t1k -t 20 --preset hla-wgs -c ./t1k_resources/hlaidx/hlaidx_dna_coord.fa -f ./t1k_resources/hlaidx/hlaidx_dna_seq.fa -b path/to/cram/file/HG00138.final.cram -o ./../t1k_resources/t1k_output/NA19625

I encountered the following errors: [bam_header_read] EOF marker is absent. The input is probably truncated. [bam_header_read] invalid BAM binary header (this is not a BAM file). Can not open path/to/cram/file/HG00138.final.cram

I followed the instructions to install htslib and then ran htslib=1 make, which I think worked successfully (see attached screenshot). Screenshot 2024-07-22 123745

I think all necessary files were created after running "htslib=1" make.

I have a question: doesn't the .cram file only store information that is different from that of the .fa file used during the alignment, resulting in a smaller file size compared to the .bam file? Wouldn't this mean that we need to use the .fa file (used during the alignment) somewhere in the run-t1k process?

@alicekiki, how did you manage to get this working?

Thanks, and I look forward to your reply.

alicekiki commented 1 month ago

@Bondada20 https://github.com/mourisl/T1K/blob/master/Makefile, i first modified the ninth row to 'htslib=1', and then run install. Try and see if there is a difference.

Bondada20 commented 1 month ago

@alicekiki by "run install" do you mean to run "make" after modifying the ninth row to "htslip=1" or..? Also, the .cram file you are using, is it publicly available so I can test it?

Also, if you have time, can you test any of the .cram files from here "https://www.ebi.ac.uk/ena/browser/view/PRJEB31736" and see if it works?

alicekiki commented 1 month ago

@Bondada20 image Modify Between step1 and step2. Dont have publicly available cram file for the moment.

mourisl commented 1 month ago

Have you modified the header in the alignment.h file to match your htslib version?

I think for the CRAM file, htslib will internal try to download the reference genome and it will be cached for future use. If you already have the reference genome, maybe the tutorial in the end of https://www.htslib.org/workflow/cram.html is useful.

Bondada20 commented 1 month ago

After some trial and error, I finally got it to work. See below for anyone having issues setting it up. Note: I am working on a server with restrictions on what I can do.

  1. As @mourisl mentioned above, if you already have the reference genome, follow the instructions under "The REF_PATH and REF_CACHE" here: https://www.htslib.org/workflow/cram.html
  2. Clone the T1K package.
  3. Edit line 9 of the Makefile by changing htslip=0 to htslip=1.
  4. Download htslib-1.15.1 to the T1K folder. Find releases here: https://github.com/samtools/htslib/releases
  5. Change directory into htslib-1.15.1 and run: autoreconf -i ./configure make
  6. Add HTSlib to the standard library path: export LD_LIBRARY_PATH=/path/to/htslib:$LD_LIBRARY_PATH
mourisl commented 1 month ago

Thank you for sharing the steps. There is no need to explicitly edit line 9 of the Makefile. When running "make", run it like "htslib=1 make" should set the value of htslib to 1.