nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 86 forks source link

training parameters #662

Open mictadlo opened 3 years ago

mictadlo commented 3 years ago

Hi,

  1. I have Illummin R1 and R2 of RNA-Seq. Do I need to provide --stranded? If yes, how can I find it out?
  2. If I ran Trinity on the cluster to speed up the runtime. Would it be enough to provide the result by --trinity parameter only or do I still have to provide -l and -r parameters?
  3. Is Is possible to use the output from Mikado as the --trinity parameter? This tool apparently shows

    ..that the accuracy of transcript reconstruction can be boosted by combining multiple methods, and we present a novel algorithm to integrate multiple RNA-seq assemblies into a coherent transcript annotation. Our algorithm can remove redundancies and select the best transcript models according to user-specified metrics while solving common artifacts such as erroneous transcript chimerisms.

Thank you in advance,

Best wishes,

Michal

hyphaltip commented 3 years ago
  1. Stranded would depend on whether it was a strand-specific library prep or not. You can guess whether it was strand-specific with some analyses like rseqc - see infer_experiment.py
  2. currently you need to still provide the raw fastq even if you provide an existing trinity transcript set because select of genes to use in training is further conditioned by expression level/coverage as well
  3. I think if it is just assembled transcripts then seems like yes you can just provide that assembly instead of trinity assembly