Closed bitcometz closed 6 years ago
A) Heng and I are tuning parameters to make the default presets work well for various seq types , genome size. If you play well with those parameters, please set them separately. -x presets
aims to provide a quick start for new users.
B) It is the first. I will chnage it to
' -X
Jue
I get it now, thanks!
And I have two opinions and I don't know whether they are right or wrong
(1) And for PacBio reads, the longest is not alway the best according to their experience:
Filtering options for your input data for pre-assembly can also be set with the pa_fasta_filter_option flag. The default is streamed-internal-median which uses the median-length subread for each ZMW (sequencing reaction well). Choosing the longest subread can lead to an enrichment in chimeric molecules. Users will rarely need to change this option from the default.
streamed-internal-median | Applies the median-length ZMW filter only on internal subreads (ZMWs with >= 3 subreads) by running a single pass over the data. The input subreads should be groupped by ZMW. For ZMWs with < 3 subreads, the maximum-length one is selected. |
---|
https://github.com/PacificBiosciences/pb-assembly
(2) And suppose there are some genome regions which tends to be easily fragmented during DNA extraction. With the sequencing price reducing rapidly, it is routine that a genome will be sequenced more than 100x. In this case, choosing the longest 50x reads might affect the assembly coverage?
1) Thanks so much for the suggestion. I will try.
2) -X can be specified in command line, -X 100 will cope with it.
https://github.com/ruanjue/wtdbg2/commit/c807560311b459f81920fb4a77a5b322403f01e3
I have add an option --rdcov-filter <0|1>
to choose longest or median reads. Please have a try!
Thanks ! I will have a try and evaluate the result
hello, the following is a simple test on the Ath 7Gbp data:
There is not much difference between the two test results.
Best,
Hello, I noticed there are some new parameters in the latest version of wtdbg2:
(A) nanopore/ont: -p 19 -AS 2 -s 0.05 -L 10000 sequel/sq: -p 0 -k 15 -AS 2 -s 0.05 -L 10000
the parameter "A" was set in sequel and ONT reads. As mentioned before, the alignment of contained reads will have few affects on the assembly results, so why we have to keep all these alignments?
(B) -X Choose the best depth for layout (effective with -g) [50]
Does this parameter(-X 50) represent that we choose the longest 50 depth data to do the alignment and then perform the assembly, or do we use all the reads to do the alignment and then choose the best result of 50 depth alignment for assembly? How is this best defined?
Thanks!