spholmes / F1000_workflow

43 stars 33 forks source link

Error learn question #34

Closed jellyfish1111 closed 5 years ago

jellyfish1111 commented 5 years ago

Hi, I have been a question about the pipeline used in the F1000 paper compared to the "official"pipeline included in the dada2 webpage, concerning the error rate.

In the F1000 manuscrip the code used for learning errors is:

ddF <- dada(derepFs[1:40], err=NULL, selfConsist=TRUE) ddR <- dada(derepRs[1:40], err=NULL, selfConsist=TRUE) dadaFs <- dada(derepFs, err=ddF[[1]]$err_out, pool=TRUE) dadaRs <- dada(derepRs, err=ddR[[1]]$err_out, pool=TRUE)

In the dada2 "official" site it is

errF <- learnErrors(filtFs, multithread=TRUE) errR <- learnErrors(filtRs, multithread=TRUE) dadaFs <- dada(derepFs, err=errF,multithread=TRUE) dadaRs <- dada(derepRs, err=errR, multithread=TRUE)

In the F1000 manuscript is it only using sample 1 to estimate the overall errors? Is it enough?

Thank you very much for you answer,

Best regards,

Cristina

benjjneb commented 5 years ago

In general, I'd recommend using the dada2 site workflow for the initial ASV processing as it is kept updated with the latest version of the package.

In this case, the results are basically the same either way, but the newer syntax is easier to understand. The old syntax will get roughly the same result though, and the error rate there is being learned from 40 samples not just 1.

jellyfish1111 commented 5 years ago

Thanks a lot for the quick answer!!!