miRTop / mirtop

command lines tool to annotate miRNAs with a standard mirna/isomir naming
https://mirtop.readthedocs.org
MIT License
18 stars 21 forks source link

mirtop export to isomiRs #65

Closed DrHogart closed 4 years ago

DrHogart commented 4 years ago

Hi,

can you please specify what exactly type format of mirtop export is compatible with isomiRs package? This page claims that isomiRs can obtain the input files from mirtop export with default --format=isomirs. At the same time this page claims that isomiRs can deal with only with .mirna files. Runing mirtop export with deafult --format option results to the .tsv file

seq     mir     mism    add     t5      t3      sample_mir6_2
TCTATCACAGTGGCTGTTCTTTT dme-miR-6-3p    2CA     0       TA      t       3
TCTATCACAGTGGCTGTTCTTTTT        dme-miR-6-3p    2CA     0       TA      0       2
ATATCACAGTGGCTGTTCTTTTT dme-miR-6-3p    0       0       A       0       1
CTATCACAGTGGCTGTTCTTTTT dme-miR-6-3p    1CA     0       A       0       7
CTATCACAGTGGCTGTTCTTTT  dme-miR-6-3p    1CA     0       A       t       1
TATCGCAGTGGCTGTTCTTTT   dme-miR-6-3p    5GA     0       0       t       1
AATCACAGTGGCTGTTCTTTTT  dme-miR-6-3p    1AT     0       0       0       1

which is impossible to fed into the IsomirDataSeqFromFiles:

Error в [.data.frame(table, , c("seq", "freq", "mir", "mism", "add", : undefined columns selected

Runing mirtop export with --format=seqbuster results to the mirna file

seq     name    freq    mir     start   end     mism    add     t5      t3      s5      s3      DB      precursor       ambiguity
ATATCACAGTGGCTGTTCTTTTT ATATCACAGTGGCTGTTCTTTTT 1       dme-miR-6-3p    miRBasev22      isomiR  0       0       A       0       NA      NA      miRNA   dme-mir-6-3     1
TATCACAGTGGCTGTTCT      TATCACAGTGGCTGTTCT      3       dme-miR-6-3p    miRBasev22      isomiR  0       0       0       tttt    NA      NA      miRNA   dme-mir-6-3     1
TATCACAGTGGCTGTTCTTTT   TATCACAGTGGCTGTTCTTTT   274     dme-miR-6-3p    miRBasev22      isomiR  0       0       0       t       NA      NA      miRNA   dme-mir-6-3     1
TATCACAGTGGCTGTTTTT     TATCACAGTGGCTGTTTTT     1       dme-miR-6-3p    miRBasev22      isomiR  0       TTT     0       ttt     NA      NA      miRNA   dme-mir-6-3     1
TATCACAGTGGCTGTTCTTTTT  TATCACAGTGGCTGTTCTTTTT  691     dme-miR-6-3p    miRBasev22      ref_miRNA       0       0       0       0       NA      NA      miRNA   dme-mir-6-3     1

(with wrong freq, s5 and s3 columns) which is also incompatible with isoSelect from isomiRs package:

head(isoSelect(ids, mirna="dme-mir-6-3p", 100))
Error в base::rowSums(x, na.rm = na.rm, dims = dims, ...) :
  'x' must be an array of at least two dimensions

Is this is a bug, or I've missed something?

mirtop 0.4.24.dev0

lpantano commented 4 years ago

Hi,

thanks for trying this new feature. I think the issue is under devel still in the isomiRs package. Can you take a look at this: http://lpantano.github.io/isomiRs/reference/IsomirDataSeqFromMirtop.html?

You will need to install from github with devtools.

I don't know exactly the other error after importing from .mirna files. Why you say wrong freq? By default I used NA in s3 and s5 because it is not longer used. If you want, you can send me one .mirna file and look at that. It seems a bug, but I cannot identify it only with this error.

Thanks!

DrHogart commented 4 years ago

Brief answer: i'he expected that freq column should contain frequencyof of each read type but there are sequences only. Tomorrow I will look the link, thanks.

пн, 2 мар. 2020 г., 21:39 Lorena Pantano notifications@github.com:

Hi,

thanks for trying this new feature. I think the issue is under devel still in the isomiRs package. Can you take a look at this: http://lpantano.github.io/isomiRs/reference/IsomirDataSeqFromMirtop.html?

You will need to install from github with devtools.

I don't know exactly the other error after importing from .mirna files. Why you say wrong freq? By default I used NA in s3 and s5 because it is not longer used. If you want, you can send me one .mirna file and look at that. It seems a bug, but I cannot identify it only with this error.

Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/miRTop/mirtop/issues/65?email_source=notifications&email_token=AAE67TWO6AJW7EPVHI7QUETRFP4NBA5CNFSM4K7QK3R2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENQOKHQ#issuecomment-593552670, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE67TXPLW2W6NV47FCZWSDRFP4NBANCNFSM4K7QK3RQ .

lpantano commented 4 years ago

if this is the starting of the line, th

 ATATCACAGTGGCTGTTCTTTTT ATATCACAGTGGCTGTTCTTTTT 1 dme-miR-6-3p miRBasev22 isomiR 0 0 A 0 NA NA miRNA dme-mir-6-3 1

The third column should be freq from seq name freq ...., what it is...probably the isoSelect is another bug that I will be happy to fix :)

DrHogart commented 4 years ago

ok, I see, sorry, third column freq is indeed correct. Why s3 and s5 are not used anymore? Now I've installed isomiRs 1.11.4 from github and play with example from the help manpage:

library(readr)
path <- system.file("extra", "mirtop", package="isomiRs")
fn <- list.files(path, full.names = TRUE)
de <- data.frame(row.names=c("sample1" , "sample2"),
                 condition = c("cc", "cc"))
IsomirDataSeqFromMirtop(read_tsv(fn), de)

Then I run:

isoSelect(ids, mirna='hsa-miR-127-3p', 0)

that resulted in:

Error в base::rowSums(x, na.rm = na.rm, dims = dims, ...) : 'x' must be an array of at least two dimensions

lpantano commented 4 years ago

Ok, I think this a bug related to the data input. If you can share the file I can debug it easily. Thanks!

lpantano commented 4 years ago

Oh, I see is the dummy example, let me take a look.

DrHogart commented 4 years ago

Hi, I did a small investigation, and it seems that problem is here:

https://github.com/lpantano/isomiRs/blob/e12fb1d563062d5b1d4f6d3014146e71607f1e60/R/AllMethods.R#L132

it should be

DataFrame(dt[ rowSums(dt[,1:ncol(dt), drop = FALSE] > 0 ) > 0, , drop=FALSE])
lpantano commented 4 years ago

Thanks, you are right, it seems we need drop=FALSE there as well just in case is only one sample, and the column number to be selected to include sample1. Thanks!

DrHogart commented 4 years ago

Thanks, Lorena!