mourisl / T1K

T1K is a versatile methods to genotype highly polymorphic genes (e.g. KIR, HLA) with bulk or single-cell RNA-seq, WGS or WES data.
MIT License
42 stars 7 forks source link

Long Processing Time #7

Closed ShawnSimp closed 9 months ago

ShawnSimp commented 1 year ago

Hi, More a query than an issue. Any idea why a sample could be hanged at the analyzer step (--preset hla) for more than 24hrs with -t 30. I have only witness it with some fastqs and I have yet to further investigate the files themselves but I can say there is no significance in # of reads or size of the file.

Thanks for any possible insights.

mourisl commented 1 year ago

The analyzer step will call the SNPs, which involves enumerating all the alleles to which the multiple-aligned reads in that region are aligned. I guess in your case, a region happens to involve too many alleles and will take a long time to complete. If you can share the _aligned and *allele.tsv files with me, I can take a look at them and improve the time efficiency later.

If you only need the genotyping information, you can run T1K with the option "--skipPostAnaysis" to just conduct genotyping. In your case, since the genotyping has already finished, you can safely kill the job and directly use the genotyping results in the *_genotype.tsv file.

ShawnSimp commented 1 year ago

Ah okay! Thanks so much, I'll see if i can provide you with the _aligned & allele.tsv.

ShawnSimp commented 1 year ago

Also you should correct in your parameters description on the github, "--skipPostAnaysis" to "--skipPostAnalysis"

mourisl commented 1 year ago

Nice catch! It is fixed now. Thank you.