nanoporetech / modkit

A bioinformatics tool for working with modified bases
https://nanoporetech.com/
Other
136 stars 7 forks source link

modkit dmr - failed to read tabix index #178

Open lilypeck opened 5 months ago

lilypeck commented 5 months ago

Hello

mod_kit 0.2.8

I am getting a very basic error:

> Error! failed to read tabix index "barcode05_E_CHH.bedmethyl.gz.tbi"
>  caused by invalid reference sequence names
>  caused by expected EOF

My script is:

/u/home/l/ldpeck/project-vlsork/longreads/dist/modkit dmr pair \
  -a ${dmrD_1}_${context}.bedmethyl.gz \
  -a ${dmrD_2}_${context}.bedmethyl.gz \
  -b ${dmrC_1}_${context}.bedmethyl.gz \
  -b ${dmrC_2}_${context}.bedmethyl.gz \
  -o dmr/dmp_${run}_${context}.tab \
  --ref /u/home/l/ldpeck/genome_resources/GCF_001633185.2_ValleyOak3.2_genomic.fna \
  --base C \
  -t 24 \
  -f \
  --log-filepath dmr/dmp_${run}.log

However I don't think the error is related to my .tbi files, because I have re-run a script that previously successfully completed with mod_kit 0.2.7, and it now fails with this error for mod_kit 0.2.8.

Is there something you can see that might be causing this?

Thank you in advance!

Lily

ArtRand commented 5 months ago

Hello @lilypeck,

There shouldn't be any changes in modkit v0.2.7 to v0.2.8 with respect to how the tabix index is handled. However, I did update the dependencies that modkit uses, so it's possible that it picked up a bug. Could you?

  1. Check if v0.2.7 works on the same input.
  2. Attach the tabix index that is failing to this thread so I can investigate what the problem is.

Thanks.

lilypeck commented 5 months ago

Hello @ArtRand Thank you for your response! I have just checked with v0.2.7 and I don't get the tabix file error -

> reading reference FASTA at "/u/home/l/ldpeck/genome_resources/GCF_001633185.2_ValleyOak3.2_genomic.fna"
> running single-site analysis
> using default prior, Beta(α: 0.55, β: 0.55)
> estimating max coverages from data
> sampled 4139233 a records and 4027045 b records, calculating max coverages for 95th percentile
> calculated max coverage for a: 24 and b: 30
> running with replicates and matched samples

I have attached two .tbi files which failed. I have also checked the bedmethyls.gz and they are complete (with the same tail output as the uncompressed versions).

Thanks

Lily

barcode21_U_CG.bedmethyl.gz.tbi.txt barcode21_U_CHG.bedmethyl.gz.tbi.txt

ArtRand commented 5 months ago

Hello @lilypeck,

I was able to reproduce the error using noodles version 0.69.0 (the version in modkit 0.2.8), the error does not occur with version 0.50.0 (the version in modkit 0.2.7). What is strange, however, is that the tabix indices that I have in tests and some others I've used seem to be parsed without complaint. Could you tell me what version of tabix you have? This is what I have tested:

tabix --version
tabix (htslib) 1.18
Copyright (C) 2023 Genome Research Ltd.

If you give me a few minutes I can get you a build with the older version of the library to unblock your work, but I'd like to get to the bottom of the problem also. So to summarize, please:

  1. Tell me the version of tabix you have and if you could show me the script you're using.
  2. (If it's not too large) send me one of the bgzipped bedmethyl files.
  3. If this ends up being a noodles bug, I'd like to open an issue with the noodles developers, could you give me permission to use your file as an example to exercise the bug?
lilypeck commented 5 months ago

Hello @ArtRand Thank you very much! Tabix is:

tabix (htslib) 1.19.1
Copyright (C) 2024 Genome Research Ltd.

The complete .bedmethyl is too big to upload, so I have uploaded the first 1m lines. Or if you have an email address I could send you a copy? And yes very happy for you to use these to exercise the bug.

Thank you very much for your help.

Lily barcode21_U_CHH.bedmethyl.head.gz

ArtRand commented 5 months ago

Hello @lilypeck,

Alright, I've made a branch (build attached) where I've changed the version back. Please let me know if this build works. I'm going to investigate why the later versions don't work with tabix 1.19.1. Thanks for permission to use your files as well.

modkit_dev9c754d4c_centos7_x86_64.tar.gz

lilypeck commented 4 months ago

Hi @ArtRand Thank you so much it is working now! Lily