mehrdadbakhtiari / adVNTR

A tool for genotyping Variable Number Tandem Repeats (VNTR) from sequence data
http://advntr.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
39 stars 15 forks source link

Issues running advntr genotype #35

Closed aob93 closed 3 months ago

aob93 commented 3 years ago

Hi,

After installing the package, I'm having difficulty getting the genotype command to run on some pacbio bam files.

Running the following;

advntr genotype -p -a NA07019_SC034768.bam --update -m vntr_data/hg19_VNTRs.db -o bed --working_directory test/

Produces;

Traceback (most recent call last): File "/opt/miniconda2/bin/advntr", line 11, in sys.exit(main()) File "/opt/miniconda2/lib/python2.7/site-packages/advntr/main.py", line 121, in main genotype(args, genotype_parser) File "/opt/miniconda2/lib/python2.7/site-packages/advntr/advntr_commands.py", line 91, in genotype reference_vntrs = load_unique_vntrs_data() File "/opt/miniconda2/lib/python2.7/site-packages/advntr/models.py", line 125, in load_unique_vntrs_data right_flanking, repeats, scaled_score FROM vntrs''') sqlite3.OperationalError: no such table: vntrs

I'm at a bit of a loss here as to what is the issue. Any help is greatly appreciated.

mehrdadbakhtiari commented 3 years ago

Hello, Could you please check the content of vntr_data that you downloaded? I guess you should either use -m hg19_selected_VNTRs_Pacbio.db or -m hg19_genic_VNTRs.db (probably not -m vntr_data/hg19_VNTRs.db) depending on which vntr_data you have downloaded. Let me know how it goes and I'd be happy to help if the issue persist.

aob93 commented 3 years ago

Thanks! that seems to have worked, although the command runs without any errors and produces no output so I assume I'm still not quite there.

Also, I can only install and run v1.2 via conda. v1.4 would be my preference but installing from source has not proved fruitful. Any suggestions?

mehrdadbakhtiari commented 3 years ago

Great! Please use --outfmt bed --outfile sample.bed to get the output. There is a typo in the command you wrote which doesn't quite specify the output format and output file.

With regard to version, are you using Linux? there have been some issues with conda on linux about installing the latest version and many packages have that issue. I'll update you once it is solved. Meanwhile, you can try to install from source by running only this commands. Since you have installed the older version with conda, you shouldn't have any problems with requirements and python setup.py install should take care of everything. I haven't tried this myself but I think it worth a shot.

aob93 commented 3 years ago

Will do!

I'm on Mac. Trying installing from source and then calling advntr results in the following error;

ImportError: Keras requires TensorFlow 2.2 or higher. Install TensorFlow via pip install tensorflow

I believe Tensor Flow 2.2 is incompatible with python v2.

mehrdadbakhtiari commented 3 years ago

Sorry about confusion. I thought you are using Linux, conda was working on Mac when I tested. @Jong-hun-Park Can you look into the issue when conda doesn't install the latest version on Mac? Thanks

aob93 commented 3 years ago

Hi again Mehrdad,

I've managed to have a bit more success having installed v1.3 from source.

I'm trying to genotype 2 custom VNTRs and I'm happy that advntr addmodel is running properly. adVNTR genotype however is outputting strange results.

Here is my command and output;

`advntr genotype -a NA07019_SC034768.clean.bam -m vntr_data/hg38_VNTRs.db -p --working_directory test

[M::bam2fq_mainloop] discarded 0 singletons [M::bam2fq_mainloop] processed 0 reads /opt/miniconda2/lib/python2.7/site-packages/advntr-1.3.3-py2.7-macosx-10.7-x86_64.egg/advntr/vntr_finder.py:401: RuntimeWarning: invalid value encountered in double_scalars 1 None 2 None ` Looking at the .log file, it is clear that the tool is able to find and count the repeats, but is having trouble converting this into a genotype, with Maximum probability for genotyping: 1e-20 for both VNTRs I'm querying.

Is this something you have came across before and would you have any suggestions to get a genotype?

Thanks again!

mehrdadbakhtiari commented 3 years ago

Hello, I'm glad you could finally install a newer version. Yes this output basically means adVNTR is unable to genotype these two loci. Could you please email me the log file so I can take a look (at least the parts that don't contain read sequences if that's not public)? What I can say now is either they have a very short pattern and high number of repeats (long overall length) or there are more than one copy of those VNTRs in the genome and it's not easy to distinguish between them. I'd be happy to look more and fix the issue if it's something else.

aob93 commented 3 years ago

Thanks again for your help, much appreciated!

Here’s an example of a log file. There are no read sequences, just motif counts so all good!

Best, Aidan

From: Mehrdad Bakhtiari notifications@github.com Reply-To: mehrdadbakhtiari/adVNTR reply@reply.github.com Date: Tuesday, October 13, 2020 at 3:16 PM To: mehrdadbakhtiari/adVNTR adVNTR@noreply.github.com Cc: aob93 aidanobrien93@hotmail.com, Author author@noreply.github.com Subject: Re: [mehrdadbakhtiari/adVNTR] Issues running advntr genotype (#35)

Hello, I'm glad you could finally install a newer version. Yes this output basically means adVNTR is unable to genotype these two loci. Could you please email me the log file so I can take a look (at least the parts that don't contain read sequences if that's not public)? What I can say now is either they have a very short pattern and high number of repeats (long overall length) or there are more than one copy of those VNTRs in the genome and it's not easy to distinguish between them. I'd be happy to look more and fix the issue if it's something else.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmehrdadbakhtiari%2FadVNTR%2Fissues%2F35%23issuecomment-707953617&data=04%7C01%7C%7Ce560d7b235574556346808d86fac83fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637382134017135418%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=r7s%2FN5TvRvxj9O5vWKANBHByuJbN0SV9SjXfDIKcv%2FE%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAOPSSOPLHLBXVPVVQHLEXILSKSRRRANCNFSM4SLQQQ6A&data=04%7C01%7C%7Ce560d7b235574556346808d86fac83fd%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637382134017145412%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6JMlrGbzodfr%2FugXg6750iuKrh98DMR4jk%2BD1riseYc%3D&reserved=0.

mehrdadbakhtiari commented 3 years ago

Hi Aidan,

Sorry but it seems that github doesn't include attachments when you respond by email. Could you directly send it as an email to mehrdad.baxtiari@gmail.com? Thanks