jessieren / DeepVirFinder

Identifying viruses from metagenomic data by deep learning
Other
116 stars 32 forks source link

Error in calculating q-value #2

Open denisbruno1 opened 5 years ago

denisbruno1 commented 5 years ago

Hi! I am using DeepVirFinder to test RNA-seq experiments for virus discovery. But when I try calculate the q value, using both my dataset and the CRC_meta test, appears the following error?

Error in smooth.spline(lambda, pi0, df = smooth.df) : missing or infinite values in inputs are not allowed.

One detail are the p values = 0.0 for all the matchs. How can I fix this?

jessieren commented 5 years ago

Hi there,

Thank you for using DeepVirFinder!

Are all the predicted contigs have p-value=0.0? Or you selected those contigs with value 0.0 and estimate the q-value only for those?

We recommend estimating q-values for all contigs and then choose those contigs with small q-values as the predicted viral contigs.

If only the contigs with p-value=0.0 are fed into the q-value estimation function, the distribution may violate their model assumption, which can be the reason that causes the error. See a similar discussion in the qvalue package: https://github.com/StoreyLab/qvalue/issues/9

Thank you!

Jie

denisbruno1 commented 5 years ago

Thank u for the agility in the answer! All the predicted contigs have p-value=0.0, I am thinking that this is a kind of my error. I tried to do the methodology applied in StoreyLab, but did not work, exactly because the p-values are equals to 0, so the package o qvalue inform the follow

p-values not in valid range [0, 1].

The curious is that not only in mine dataset occurs this, but in CRC_meta test too. Had I fail in installing something correctly?

jessieren commented 5 years ago

Thank you for providing the details. Is your data mostly from viruses? If not, it is strange that all contigs have p-value=0.

How about the crassphage example? what is the score you get?

python dvf.py -i ./test/crAssphage.fa -o ./test/ -l 300

The score should be something like, name len score pvalue gi|674660337|ref|NC_024711.1| Uncultured phage crAssphage, complete genome 97065 0.9978806972503662 0.004702016768638115

Jie


From: denisbruno1 notifications@github.com Sent: Wednesday, March 6, 2019 3:54:56 AM To: jessieren/DeepVirFinder Cc: Jie Ren; Comment Subject: Re: [jessieren/DeepVirFinder] Error in calculating q-value (#2)

Thank u for the agility in the answer! All the predicted contigs have p-value=0.0, I am thinking that this is a kind of my error. I tried to do the methodology applied in StoreyLab, but did not work, exactly because the p-values are equals to 0, so the package o qvalue inform the follow? p-values not in valid range [0, 1]. The curious is that not only in mine dataset occurs this, but in CRC_meta test too. Had I fail in installing something correctly?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jessieren_DeepVirFinder_issues_2-23issuecomment-2D470080062&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=irWyXBTJAqCxHN7GNey4-g&m=lHl464njplON_vyIEF_8i73Twuq2HetR0Y91dcC9VEM&s=k1sv-PdRL1t1Dy3jJ6PeVT3oooeS5i-gpd61WMKV5fM&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHgpvDnBM6wAg-5FIiqCQq-2D1iRmTz2N3h5ks5vT6yQgaJpZM4bfdHb&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=irWyXBTJAqCxHN7GNey4-g&m=lHl464njplON_vyIEF_8i73Twuq2HetR0Y91dcC9VEM&s=QLOH6xDTwI_dqiKbRYvkKW5uSWlh4YTfbwdlxWjVPg4&e=.

denisbruno1 commented 5 years ago

When I tried this, p=0.0 again, like bellow

name len score pvalue gi|674660337|ref|NC_024711.1| Uncultured phage crAssphage, complete genome 97065 0.99788069725 0.0

ursadhip commented 2 years ago

Data scientist Sean Hackett @shackett has provided a resolution to this error.

Please visit: https://github.com/StoreyLab/qvalue/issues/19

Though I don't understand much of the statistical know-how, it worked for me.

Best