urol-e5 / deep-dive

0 stars 0 forks source link

Provide fasta files of miRNA from each approach #44

Closed sr320 closed 5 months ago

sr320 commented 5 months ago

for each species, denoting which are database matches and which are denovo. Preferably in repo if size allows.

kubu4 commented 5 months ago

A.pulchra

MirDeep2

NOTE: ATM, not sure what the distinction is between the three FastAs. Filenames provide a clue, but the documentation doesn't seem to provide specific explanations.

https://github.com/urol-e5/deep-dive/blob/main/D-Apul/output/11.1-Apul-sRNAseq-miRdeep2-31bp-fastp-merged/mirna_results_03_04_2024_t_13_00_39/novel_mature_03_04_2024_t_13_00_39_score-50_to_na.fa

https://github.com/urol-e5/deep-dive/blob/main/D-Apul/output/11.1-Apul-sRNAseq-miRdeep2-31bp-fastp-merged/mirna_results_03_04_2024_t_13_00_39/novel_pres_03_04_2024_t_13_00_39_score-50_to_na.fa

https://github.com/urol-e5/deep-dive/blob/main/D-Apul/output/11.1-Apul-sRNAseq-miRdeep2-31bp-fastp-merged/mirna_results_03_04_2024_t_13_00_39/novel_star_03_04_2024_t_13_00_39_score-50_to_na.fa

ShortStack

NOTE: FastA contains matches miRNAs identified. Does NOT contain predicted miRNAs!!

https://github.com/urol-e5/deep-dive/blob/main/D-Apul/output/13.2.1-Apul-sRNAseq-ShortStack-31bp-fastp-merged-cnidarian_miRBase/ShortStack_out/mir.fasta


P.evermanni

MirDeep2

NOTE: ATM, not sure what the distinction is between the three FastAs. Filenames provide a clue, but the documentation doesn't seem to provide specific explanations.

https://github.com/urol-e5/deep-dive/blob/main/E-Peve/output/11.1-Peve-sRNAseq-miRdeep2-31bp-fastp-merged-cnidarian_miRBase/mirna_results_22_04_2024_t_15_10_16/novel_mature_22_04_2024_t_15_10_16_score-50_to_na.fa

https://github.com/urol-e5/deep-dive/blob/main/E-Peve/output/11.1-Peve-sRNAseq-miRdeep2-31bp-fastp-merged-cnidarian_miRBase/mirna_results_22_04_2024_t_15_10_16/novel_pres_22_04_2024_t_15_10_16_score-50_to_na.fa

https://github.com/urol-e5/deep-dive/blob/main/E-Peve/output/11.1-Peve-sRNAseq-miRdeep2-31bp-fastp-merged-cnidarian_miRBase/mirna_results_22_04_2024_t_15_10_16/novel_star_22_04_2024_t_15_10_16_score-50_to_na.fa

ShortStack

NOTE: FastA contains matches miRNAs identified. Does NOT contain predicted miRNAs!!

https://github.com/urol-e5/deep-dive/blob/main/E-Peve/output/08.2-Peve-sRNAseq-ShortStack-31bp-fastp-merged/ShortStack_out/mir.fasta


P.meandrina

MirDeep2

NOTE: ATM, not sure what the distinction is between the three FastAs. Filenames provide a clue, but the documentation doesn't seem to provide specific explanations.

https://github.com/urol-e5/deep-dive/blob/main/F-Pmea/output/11.1-Pmea-sRNAseq-miRdeep2-31bp-fastp-merged-cnidarian_miRBase/mirna_results_22_04_2024_t_15_27_22/novel_mature_22_04_2024_t_15_27_22_score-50_to_na.fa

https://github.com/urol-e5/deep-dive/blob/main/F-Pmea/output/11.1-Pmea-sRNAseq-miRdeep2-31bp-fastp-merged-cnidarian_miRBase/mirna_results_22_04_2024_t_15_27_22/novel_pres_22_04_2024_t_15_27_22_score-50_to_na.fa

https://github.com/urol-e5/deep-dive/blob/main/F-Pmea/output/11.1-Pmea-sRNAseq-miRdeep2-31bp-fastp-merged-cnidarian_miRBase/mirna_results_22_04_2024_t_15_27_22/novel_star_22_04_2024_t_15_27_22_score-50_to_na.fa

ShortStack

NOTE: FastA contains matches miRNAs identified. Does NOT contain predicted miRNAs!!

https://github.com/urol-e5/deep-dive/blob/main/F-Pmea/output/13.2.1-Pmea-sRNAseq-ShortStack-31bp-fastp-merged-cnidarian_miRBase/ShortStack_out/mir.fasta

EDITED: Update that ShorStack FastAs do not have predicted miRNAs.

sr320 commented 5 months ago

Just going to add this here for later discussion

File Path Format Type Num Seqs Sum Len Min Len Avg Len Max Len
../output/11.1-Apul-sRNAseq-miRdeep2-31bp-fastp-merged//mirna_results_03_04_2024_t_13_00_39/novel_mature_03_04_2024_t_13_00_39_score-50_to_na.fa FASTA DNA 896 19,310 17 21.6 25
../output/11.1-Apul-sRNAseq-miRdeep2-31bp-fastp-merged//mirna_results_03_04_2024_t_13_00_39/novel_pres_03_04_2024_t_13_00_39_score-50_to_na.fa FASTA DNA 896 54,056 35 60.3 110
../output/11.1-Apul-sRNAseq-miRdeep2-31bp-fastp-merged//mirna_results_03_04_2024_t_13_00_39/novel_star_03_04_2024_t_13_00_39_score-50_to_na.fa FASTA DNA 896 19,339 13 21.6 31
../output/13.2.1-Apul-sRNAseq-ShortStack-31bp-fastp-merged-cnidarian_miRBase/ShortStack_out/mir.fasta FASTA DNA 114 5,281 21 46.3 98
kubu4 commented 5 months ago

I've updated my previous post to indicate that the ShortStack FastAs do not contain predicted miRNAs.

For predicted miRNAs, we'll have to use the Results.gff3 (e.g. https://github.com/urol-e5/deep-dive/blob/main/E-Peve/output/08.2-Peve-sRNAseq-ShortStack-31bp-fastp-merged/ShortStack_out/Results.gff3) to extract FastA.

But, it would probably also be a good idea to only extract predicted miRNAs which have some minimum threshold of read alignments (can be found in column 6).

sr320 commented 5 months ago

What value is column six? That is, how do we filter? Numbers vary in orders of magnitude.

kubu4 commented 5 months ago

What value is column six?

read alignments (can be found in column 6).

Sorry, what do you mean by "how do we filter?"

We'd have to come up with a number of reads which we think provides sufficient support to decide whether or not a predicted miRNA locus is accurate or not.

sr320 commented 5 months ago

sorry I thought that was a confidence score, did not realize was number of reads.

Which is odd as you clearly stated "read alignments (can be found in column 6)" :)