Provide fasta files of miRNA from each approach

sr320 commented 6 months ago

for each species, denoting which are database matches and which are denovo. Preferably in repo if size allows.

kubu4 commented 6 months ago

A.pulchra

MirDeep2

NOTE: ATM, not sure what the distinction is between the three FastAs. Filenames provide a clue, but the documentation doesn't seem to provide specific explanations.

https://github.com/urol-e5/deep-dive/blob/main/D-Apul/output/11.1-Apul-sRNAseq-miRdeep2-31bp-fastp-merged/mirna_results_03_04_2024_t_13_00_39/novel_mature_03_04_2024_t_13_00_39_score-50_to_na.fa

https://github.com/urol-e5/deep-dive/blob/main/D-Apul/output/11.1-Apul-sRNAseq-miRdeep2-31bp-fastp-merged/mirna_results_03_04_2024_t_13_00_39/novel_pres_03_04_2024_t_13_00_39_score-50_to_na.fa

https://github.com/urol-e5/deep-dive/blob/main/D-Apul/output/11.1-Apul-sRNAseq-miRdeep2-31bp-fastp-merged/mirna_results_03_04_2024_t_13_00_39/novel_star_03_04_2024_t_13_00_39_score-50_to_na.fa

ShortStack

NOTE: FastA contains matches miRNAs identified. Does NOT contain predicted miRNAs!!

https://github.com/urol-e5/deep-dive/blob/main/D-Apul/output/13.2.1-Apul-sRNAseq-ShortStack-31bp-fastp-merged-cnidarian_miRBase/ShortStack_out/mir.fasta

P.evermanni

MirDeep2

NOTE: ATM, not sure what the distinction is between the three FastAs. Filenames provide a clue, but the documentation doesn't seem to provide specific explanations.

https://github.com/urol-e5/deep-dive/blob/main/E-Peve/output/11.1-Peve-sRNAseq-miRdeep2-31bp-fastp-merged-cnidarian_miRBase/mirna_results_22_04_2024_t_15_10_16/novel_mature_22_04_2024_t_15_10_16_score-50_to_na.fa

https://github.com/urol-e5/deep-dive/blob/main/E-Peve/output/11.1-Peve-sRNAseq-miRdeep2-31bp-fastp-merged-cnidarian_miRBase/mirna_results_22_04_2024_t_15_10_16/novel_pres_22_04_2024_t_15_10_16_score-50_to_na.fa

https://github.com/urol-e5/deep-dive/blob/main/E-Peve/output/11.1-Peve-sRNAseq-miRdeep2-31bp-fastp-merged-cnidarian_miRBase/mirna_results_22_04_2024_t_15_10_16/novel_star_22_04_2024_t_15_10_16_score-50_to_na.fa

ShortStack

NOTE: FastA contains matches miRNAs identified. Does NOT contain predicted miRNAs!!

https://github.com/urol-e5/deep-dive/blob/main/E-Peve/output/08.2-Peve-sRNAseq-ShortStack-31bp-fastp-merged/ShortStack_out/mir.fasta

P.meandrina

MirDeep2

NOTE: ATM, not sure what the distinction is between the three FastAs. Filenames provide a clue, but the documentation doesn't seem to provide specific explanations.

https://github.com/urol-e5/deep-dive/blob/main/F-Pmea/output/11.1-Pmea-sRNAseq-miRdeep2-31bp-fastp-merged-cnidarian_miRBase/mirna_results_22_04_2024_t_15_27_22/novel_mature_22_04_2024_t_15_27_22_score-50_to_na.fa

https://github.com/urol-e5/deep-dive/blob/main/F-Pmea/output/11.1-Pmea-sRNAseq-miRdeep2-31bp-fastp-merged-cnidarian_miRBase/mirna_results_22_04_2024_t_15_27_22/novel_pres_22_04_2024_t_15_27_22_score-50_to_na.fa

https://github.com/urol-e5/deep-dive/blob/main/F-Pmea/output/11.1-Pmea-sRNAseq-miRdeep2-31bp-fastp-merged-cnidarian_miRBase/mirna_results_22_04_2024_t_15_27_22/novel_star_22_04_2024_t_15_27_22_score-50_to_na.fa

ShortStack

NOTE: FastA contains matches miRNAs identified. Does NOT contain predicted miRNAs!!

https://github.com/urol-e5/deep-dive/blob/main/F-Pmea/output/13.2.1-Pmea-sRNAseq-ShortStack-31bp-fastp-merged-cnidarian_miRBase/ShortStack_out/mir.fasta

EDITED: Update that ShorStack FastAs do not have predicted miRNAs.

sr320 commented 6 months ago

Just going to add this here for later discussion

File Path	Format	Type	Num Seqs	Sum Len	Min Len	Avg Len	Max Len
../output/11.1-Apul-sRNAseq-miRdeep2-31bp-fastp-merged//mirna_results_03_04_2024_t_13_00_39/novel_mature_03_04_2024_t_13_00_39_score-50_to_na.fa	FASTA	DNA	896	19,310	17	21.6	25
../output/11.1-Apul-sRNAseq-miRdeep2-31bp-fastp-merged//mirna_results_03_04_2024_t_13_00_39/novel_pres_03_04_2024_t_13_00_39_score-50_to_na.fa	FASTA	DNA	896	54,056	35	60.3	110
../output/11.1-Apul-sRNAseq-miRdeep2-31bp-fastp-merged//mirna_results_03_04_2024_t_13_00_39/novel_star_03_04_2024_t_13_00_39_score-50_to_na.fa	FASTA	DNA	896	19,339	13	21.6	31
../output/13.2.1-Apul-sRNAseq-ShortStack-31bp-fastp-merged-cnidarian_miRBase/ShortStack_out/mir.fasta	FASTA	DNA	114	5,281	21	46.3	98

kubu4 commented 6 months ago

I've updated my previous post to indicate that the ShortStack FastAs do not contain predicted miRNAs.

For predicted miRNAs, we'll have to use the Results.gff3 (e.g. https://github.com/urol-e5/deep-dive/blob/main/E-Peve/output/08.2-Peve-sRNAseq-ShortStack-31bp-fastp-merged/ShortStack_out/Results.gff3) to extract FastA.

But, it would probably also be a good idea to only extract predicted miRNAs which have some minimum threshold of read alignments (can be found in column 6).

sr320 commented 6 months ago

What value is column six? That is, how do we filter? Numbers vary in orders of magnitude.

kubu4 commented 6 months ago

What value is column six?

read alignments (can be found in column 6).

Sorry, what do you mean by "how do we filter?"

We'd have to come up with a number of reads which we think provides sufficient support to decide whether or not a predicted miRNA locus is accurate or not.

sr320 commented 6 months ago

sorry I thought that was a confidence score, did not realize was number of reads.

Which is odd as you clearly stated "read alignments (can be found in column 6)" :)

urol-e5 / deep-dive