Closed sr320 closed 2 months ago
Just a heads up, I'm having difficulty finding the MirMiner
software. The original paper describing MirMiner doesn't provide any info on how to obtain the software. Additionally, the paper with the pipeline described above doesn't provide any info on how they obtained it, either.
I'm exploring some other options to try to delve into miRNA prediction (e.g. MirMachine (github repo)). And then perhaps BLAST sRNA-seq data and see if/how many reads map to regions?
Cristian's lab does a lot of this .. primarily with CLC but here a blurb from a paper that might offer options / directions.
Comprehensive Transcriptome Analyses in Sea Louse Reveal Novel Delousing Drug Responses Through MicroRNA regulation
A BLAST analysis was performed to discard other non-coding RNAs (short mRNAs, rRNAs, tRNA) using the specific databases of ncRNAs in NCBI, RFam, and Repbase. The tool “Extract and count” in CLC Genomics was used to identify and extract unique miRNA families annotated by BLAST against the available known miRNAs in all arthropod species miRbase (Griffiths-Jones et al. 2006). The annotation of miRNAs was conducted with the following parameters: additional downstream bases = 2; additional upstream bases = 2; maximum mismatches = 2; missing bases downstream = 2; and missing bases upstream = 2. Novel miRNA prediction was performed using the miRanalyzer software (Hackenberg et al. 2009).
Thanks. Have some BLASTs running ATM against the two miRNA databases.
Have miRNA loci predictions from MirMachine already.
I'll start posting more results as I get them.
miRTrace results:
1 read (yep, just one read) in sample sRNA-POR-82-S1-TP2
matched to the Insects clade family of miRNAs.
No matches to any clade in any other samples.
MirMachine results (predict presence of miRNA families in P.evermanni genome):
Porites_evermanni_v1.fa
NCBI BLASTn results (against miRBase and miRGene databases):
No matches from any samples in either database.
UPDATE:
Currently attempting to run mirdeep2 (GitHub repo - "Discovering known and novel miRNAs from small RNA sequencing data") to see what we can get from that.
mirdeep2 stuff is completed. I'll be sifting through the data in a bit. In the meantime, if you want to glance through some of the HTML reports...
Also, there are PDFs which have actual structural representations:
CSVs if you want to look at those, too:
EDITED: Added correct CSV link for last file.
mirdeep2 summary:
sample | novel miRNA_loci (count) |
---|---|
sRNA-POR-73-S1-TP2 | 342 |
SRNA-POR-79-S1-TP2 | 282 |
SRNA-POR-82-S1-TP2 | 262 |
Mean novel miRNA loci count: 295.3
These are counts of novel miRNA loci with significant randfold
p-values.
NOTE: Even within individuals, there are loci which have overlapping coordinates, thus the numbers above are probably higher than they actually should be.
Used bedtools to try to get a summary of "canonical" miRNAs identified in the sRNA-seq data/genome:
/home/shared/bedtools2/bin/intersectBed \
-a result_09_08_2023_t_15_17_52.bed \
-b result_09_08_2023_t_18_36_57.bed result_10_08_2023_t_06_23_41.bed \
-u \
> results-intersect.bed
This yielded 183 "canonical" miRNAs. A couple of caveats:
randfold
p-valueAlso, with the mirdeep2 analysis, in an effort to get some analysis done quickly, I did not run this with any miRNA database sets. I only ran it for novel miRNA discovery. As noted in the the mirdeep2 documentation, utilizing a miRNA database, even with distantly related species, will generally improve discovery.
I'll try to get that aspect of things run today, but each analysis (without database comparisons) takes about 1.5hrs. So, to process the three samples together is about 4.5hrs. I'm guessing the database comparisons will increase that run time.
@kubu4 could you link the code that you used for the analysis above?
I haven't had a chance to really write anything up. However, some of it is semi-documented in this notebook post:
https://robertslab.github.io/sams-notebook/2023/08/01/Daily-Bits-August-2023.html
Go to 20230808 to get to the start of some (most?) of the miRNA analysis.
Based on https://royalsocietypublishing.org/doi/pdf/10.1098/rstb.2020.0165
in short: All sncRNA data were converted into FASTQ format and cutadapt v. 1.18 [54] was used to trim the raw reads, setting the minimal quality threshold to PHRED25, removing adaptor sequences and applying a size range of 18–40 nt.
[ ] miRNAs obtained from published studies were BLASTed (blastn) against miRBase and MirGeneDB databases.
[ ] Limited to taxa specificsncRNA-seq data, miRTrace v.1.0.0 [55] was used to group similar read sequences into clusters, to verify the quality of each dataset, miRNA size distribution and the presence of possible contami- nants, namely miRNAs of different lineages.
[ ] MirMiner [22] was applied to identify bona fide miRNAs and to provide a phylo- genetic classification of known miRNAs following up-to-date annotation criteria. In detail: (i) the presence of coverage for both arms of the miRNA sequences, (ii) the distance between the mature and star sequences being lower than 40 nt, (iii) the absence of reads mapped in the surroundings of the annotated miRNAs, (iv) 50 homogeneity of the mature miRNA, (v) 2 nt over- hang and (vi) a reduced free energy.
[ ] The genomic position of each bona fide mussel and clam miRNA was localized using blastn.