jessieren / VirFinder

VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data
Other
130 stars 24 forks source link

defining VF.train.user.R subLengthAll #15

Closed Fzhang1992 closed 4 years ago

Fzhang1992 commented 5 years ago

Hi @jessieren I train VirFinder with 300bp fragments and it shows the error:

Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
  cannot open compressed file '/seqTrainKmerCount/VF.trainKmer.tara_bacteria.fa.subLen500.k8.file1.RData', probable reason 'No such file or directory'

I notice Vf.train.user.R script defined the subLengthALL as 0.5kb, 1kb and 3kb. Is that mean all train sequences need be >3000bp?

Another question: I prepare some sequences as 10000bp in train set and test set (like your article Table 1 and Fig 1A, 10000bp), did the train set need to split as 0.5kb, 1kb and 3kb?

jessieren commented 5 years ago

Hi Fzhang,

Thanks for your questions.

Yes, the program fragments training sequences into 0.5kb, 1kb, and 3kb to get the three models. So it does require some of the input training sequences >3000bp.

To your second question, yes, and VF.train.user should automatically split input sequences into short fragments of fixed lengths.

Jessie