mehrdadbakhtiari / adVNTR

A tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data
http://advntr.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
39 stars 15 forks source link

sqlite3.OperationalError: disk I/O error #45

Open Parsoa opened 3 years ago

Parsoa commented 3 years ago

Hi Mehrdad,

I'm trying to run advntr on some PacBio reads mapped to hg38 and I'm using the hg38 models (hg38_selected_VNTRs_Illumina.db) however I get a sqlite3.OperationalError when running with the following options:

advntr genotype --alignment_file HG00733.sorted.bam --working_directory $PWD --pacbio -r hg38.no_alt.fa --models vntr_data/hg38_selected_VNTRs_Illumina.db --outfile advntr.vcf -
-outfmt vcf
Using Theano backend.
Traceback (most recent call last):
  File "/software/anaconda3/4.9.2/lssc0-linux/envs/advntr-1.4.1/bin/advntr", line 11, in <module>
    sys.exit(main())
  File "/software/anaconda3/4.9.2/lssc0-linux/envs/advntr-1.4.1/lib/python3.6/site-packages/advntr/__main__.py", line 134, in main
    genotype(args, genotype_parser)
  File "/software/anaconda3/4.9.2/lssc0-linux/envs/advntr-1.4.1/lib/python3.6/site-packages/advntr/advntr_commands.py", line 104, in genotype
    reference_vntrs = load_unique_vntrs_data()
  File "/software/anaconda3/4.9.2/lssc0-linux/envs/advntr-1.4.1/lib/python3.6/site-packages/advntr/models.py", line 140, in load_unique_vntrs_data
    right_flanking, repeats, scaled_score FROM vntrs''')
sqlite3.OperationalError: disk I/O error

All the files referenced in the command exist. I tried redownloading the VNTR models in case they were corrupted. I get the same error if I use the hg19 ones. Any suggestions? Thanks.

mehrdadbakhtiari commented 3 years ago

Hi Parsoa,

I think if you use the database file for PacBio (hg19_selected_VNTRs_Pacbio.db) instead of hg38_selected_VNTRs_Illumina.db it will work. It has other loci that works with --pacbio which don't exist in hg38_selected_VNTRs_Illumina.db.

We haven't tested adVNTR using PacBio reads on GRCh38 and while it should work, we don't have test data or models for it. Let me know if you specifically have GRCh38 data and I can look into how to make it work.

Parsoa commented 3 years ago

I tried with the hg19_selected_VNTRs_Pacbio.db too but I still get the same error (anyway I'm not sure if this would have worked as my sample is mapped th hg38). I get the same error even if I don't pass --models.