jessieren / VirFinder

VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data
Other
130 stars 24 forks source link

Train VirFinder with new geomes #10

Closed alyosama closed 4 years ago

alyosama commented 5 years ago

Hi @jessieren

I tried to train VirFinder with new genomes but it shows me this error.

Error in save(seqTrainKmerCount, file = file.path(seqTrainKmerCountDir, : error writing to connection Calls: VF.train.user -> trainDataCollect -> save In addition: Warning message: 'rBind' is deprecated. Since R version 3.2.0, base's rbind() should work fine with S4 objects Execution halted

What should I do ?

Thanks, Aly

jessieren commented 5 years ago

Hi @alyosama ,

Thank you very much for your interest in VirFinder.

Could you please send me a sample of your training data so that I can debug on my computer?

Jessie

alyosama commented 5 years ago

Thanks @jessieren , I have fixed this issue on my machine.

Anyways, I have realized that the training process is very very slow ( it took days). do you have any suggestions to make it faster ?

jessieren commented 5 years ago

Hi Aly,

I am sorry for my slow response!

The training process includes first fragment the input genomes into short contigs and counts kmers, and after processing contigs, it subsamples the same number of contigs for positive and negative, and trains a machine learning model based on the kmer frequencies.

The training can be slow if the size of your negative sequences (for example bacteria) is much larger than the size of the positive sequences. I would suggest reducing the size of the negative genome by downsampling a subset of the sequences and then use that as the input. In this way, it would not waste time processing all the sequences in the negative set and later only samples a few of them to match the number of positive sequences. Does that make sense? Or maybe this is not your situation?

Thanks!

Best wishes, Jessie


From: Aly O. Abdelkareem notifications@github.com Sent: Sunday, September 23, 2018 10:07:38 PM To: jessieren/VirFinder Cc: Jie Ren; Mention Subject: Re: [jessieren/VirFinder] Train VirFinder with new geomes (#10)

Thanks @jessierenhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jessieren&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=irWyXBTJAqCxHN7GNey4-g&m=gm0Yox70UWRyjceooORSn68b82IP6OrLwgTXgCDCBxg&s=WD4cwkYHLnyEYiL7t06nwu4IBf6-LImYiFG6Yg76uaQ&e= , I have fixed this issue on my machine.

Anyways, I have realized that the training process is very very slow ( it took days). do you have any suggestions to make it faster ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jessieren_VirFinder_issues_10-23issuecomment-2D423877985&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=irWyXBTJAqCxHN7GNey4-g&m=gm0Yox70UWRyjceooORSn68b82IP6OrLwgTXgCDCBxg&s=UuREJay2w6P5lr1vqwp1dIIHxY83aRSn274dxFu1omA&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHgpvCqf-5FJvMA2oohHZl26leOosdcIz1ks5ueGiagaJpZM4Ww-5F-5FF&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=irWyXBTJAqCxHN7GNey4-g&m=gm0Yox70UWRyjceooORSn68b82IP6OrLwgTXgCDCBxg&s=U-UHaMY-dJLBcMRzF1fKQJF3HxusRg3voxHLHseCvY0&e=.