ldiao / MixMir

A mixed linear model approach to small RNA motif discovery
4 stars 4 forks source link

can MixMir run with in house transcriptome? #6

Open LliliansCalvo opened 4 years ago

LliliansCalvo commented 4 years ago

Hi, I am trying to ru MixMir, now working with your testdata, but I get this error with my own data. ('No refseq ID detected', '>TRINITY_GG_86181_c0_g1_i4') Can MixMir be run with my own fasta file regardless of its headers? fasta headers example:

TRINITY_GG_86181_c0_g1_i4 TRINITY_GG_128934_c0_g1_i8 TRINITY_GG_52087_c0_g1_i1 TRINITY_GG_29872_c0_g1_i3 TRINITY_GG_93846_c0_g1_i2 TRINITY_GG_137216_c0_g1_i7 TRINITY_GG_103307_c1_g1_i3 TRINITY_GG_60964_c0_g1_i6 TRINITY_GG_86691_c1_g1_i7 TRINITY_GG_3149_c0_g1_i12

ldiao commented 4 years ago

Hi, I just took a quick look and that is not currently possible but the code should be pretty easy to amend so that it can accept any header (this code was written 7 years ago for a specific project, unfortunately when I was a graduate student I didn't have the foresight to make it more generalizable!). If you look in parseAll.py you should be able to find where it can be updated so that's no longer a problem.

If you need help with updating the python code, I'm not available to fix this currently but if you give me 1-2 days I'll try to take a look.

LliliansCalvo commented 4 years ago

Hi, Thats very kind of you.

So I have been trying to “cheat” and add to my sequences what they need to pass that filter and that worked. But I am still getting new errors. My files look like this:

--seqf

phaw_refGene_TRINITY_GG_86181_c0_g1_i4 strand=+ GAATAGGGCACTCGTGCCACTAGACCCCAACTGCAGCGAGGATGAAGCAGAGGACGACGA

--exprf TRINITY_GG_24814_c0_g1_i1 -2.50388609 TRINITY_GG_143598_c0_g1_i4 -2.504865179 TRINITY_GG_9206_c1_g1_i5 -2.50968568

--mirf

miR-92|LQNS02278089.1_34108_3p Parhyale hawaiensis 34108_3p AATTGCACTCGTCCCGGCCTGC miR-92|LQNS02278089.1_34106_3p Parhyale hawaiensis 34106_3p AATTGCACTGATCCCGGCCTGC miR-92|LQNS02278089.1_34110_3p Parhyale hawaiensis 34110_3p AATTGCACTCGTCCCGGCCTTC miR-184|LQNS02000211.1_1952_3p Parhyale hawaiensis 1952_3p TGGACGGAGAACTGATAAGGGC miR-184|LQNS02000211.1_1954_3p Parhyale hawaiensis 1954_3p TGGACGGAGAACTGATAAGGGC

At this point I do not understand why it wouldn’t work still.

error Solving MLM with GEMMA Parsing files for PLINK Removing 4096 motifs with no information /mnt/fls01-home01/mqbpwlc2/gridware/share/python/2.7.8/lib/python2.7/site-packages/numpy/lib/function_base.py:392: RuntimeWarning: Mean of empty slice. avg = a.mean(axis) /mnt/fls01-home01/mqbpwlc2/gridware/share/python/2.7.8/lib/python2.7/site-packages/numpy/core/_methods.py:78: RuntimeWarning: invalid value encountered in true_divide ret, rcount, out=ret, casting='unsafe', subok=False) /mnt/fls01-home01/mqbpwlc2/gridware/share/python/2.7.8/lib/python2.7/site-packages/numpy/lib/function_base.py:2522: RuntimeWarning: Degrees of freedom <= 0 for slice c = cov(x, y, rowvar) /mnt/fls01-home01/mqbpwlc2/gridware/share/python/2.7.8/lib/python2.7/site-packages/numpy/lib/function_base.py:2451: RuntimeWarning: divide by zero encountered in true_divide c = np.true_divide(1, fact) /mnt/fls01-home01/mqbpwlc2/gridware/share/python/2.7.8/lib/python2.7/site-packages/numpy/lib/function_base.py:2451: RuntimeWarning: invalid value encountered in multiply c = np.true_divide(1, fact) Traceback (most recent call last): File "/mnt/fls01-home01/mqbpwlc2/privatemodules/MixMir/MixMir.py", line 157, in doAll(doKin=doKin) File "/mnt/fls01-home01/mqbpwlc2/privatemodules/MixMir/MixMir.py", line 18, in doAll runParse(doKin=doKin) File "/mnt/fls01-home01/mqbpwlc2/privatemodules/MixMir/MixMir.py", line 43, in runParse parseAll.doAll(doKin=doKin,seqf=seqf,exprf=exprf,outfnkin=kinf,outPedFile=outPedFile,outMapFile=outMapFile,kkin=kkin,kmotif=kmotif,frac=frac,useFast=useFast) File "/mnt/fls01-home01/mqbpwlc2/privatemodules/MixMir/parseAll.py", line 64, in doAll makeKin(dcounts=kin_dcounts,genes=genes,outfn=outfnkin,useFast=useFast) File "/mnt/fls01-home01/mqbpwlc2/privatemodules/MixMir/parseAll.py", line 247, in makeKin np.savetxt(outfn,K,delimiter='\t',fmt='%.4f') File "/mnt/fls01-home01/mqbpwlc2/gridware/share/python/2.7.8/lib/python2.7/site-packages/numpy/lib/npyio.py", line 1377, in savetxt "Expected 1D or 2D array, got %dD array instead" % X.ndim) ValueError: Expected 1D or 2D array, got 0D array instead

On 12 Oct 2020, at 16:01, ldiao notifications@github.com<mailto:notifications@github.com> wrote:

Hi, I just took a quick look and that is not currently possible but the code should be pretty easy to amend so that it can accept any header (this code was written 7 years ago for a specific project, unfortunately when I was a graduate student I didn't have the foresight to make it more generalizable!). If you look in parseAll.py you should be able to find where it can be updated so that's no longer a problem.

If you need help with updating the python code, I'm not available to fix this currently but if you give me 1-2 days I'll try to take a look.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ldiao/MixMir/issues/6#issuecomment-707173582, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH46ZMWIHQOOAJ56YJB4AOLSKMK4HANCNFSM4SM4GQGA.

ldiao commented 4 years ago

How many motifs are you trying to assess? Is it 4096? For some reason that number was removed the rest of the downstream analysis.

LliliansCalvo commented 4 years ago

That's odd, I do not understand where that number is coming from. However, since I thought it could be an ID problem I have now changed all the IDs so they look similar to your example data however is still not running but it does run with the test/data Any idea what could this be due to? Thanks !

mm10_refGene_NM_000326734 strand=+ mm10_refGene_NM_000341290 strand=+ mm10_refGene_NM_000379325 strand=+ mm10_refGene_NM_000379317 strand=+ mm10_refGene_NM_000378785 strand=+

NM_000326734 -2.636433599 NM_000341290 -4.948907879 NM_000379325 -6.665454548 NM_000379317 -3.25614215

python MixMir.py --seqf runs/g.ens_IDs.e.3UTR_LC2vsLC3_resSig_down.fa --exprf runs/d.ens_downreg_rank.txt --mirf testdat/testmirs.fa --k_kin 6 --k_motif 6 --N 20 --fast 0 --out testdat/test Solving MLM with GEMMA Parsing files for PLINK Traceback (most recent call last): File "MixMir.py", line 157, in doAll(doKin=doKin) File "MixMir.py", line 18, in doAll runParse(doKin=doKin) File "MixMir.py", line 43, in runParse parseAll.doAll(doKin=doKin,seqf=seqf,exprf=exprf,outfnkin=kinf,outPedFile=outPedFile,outMapFile=outMapFile,kkin=kkin,kmotif=kmotif,frac=frac,useFast=useFast) File "/mnt/fls01-home01/mqbpwlc2/privatemodules/MixMir/parseAll.py", line 46, in doAll exprs = loadmic(fname=exprf) File "/mnt/fls01-home01/mqbpwlc2/privatemodules/MixMir/parseAll.py", line 112, in loadmic exprs = [[row[0],float(row[1])] for row in exprs] IndexError: list index out of range

ldiao commented 4 years ago

Is your expression file tab delimited or space delimited? (should be tab)

ldiao commented 4 years ago

Just checking in here--were you able to get your data set to run with tab delimited input data?

LliliansCalvo commented 4 years ago

After changing to tab delimited only one new error left. It will compute fine but the table is looking like this:

Solving MLM with GEMMA Parsing files for PLINK Removing 0 motifs with no information /mnt/fls01-home01/mqbpwlc2/gridware/share/python/2.7.8/lib/python2.7/site-packages/numpy/lib/function_base.py:2530: RuntimeWarning: invalid value encountered in true_divide c /= stddev[:, None] /mnt/fls01-home01/mqbpwlc2/gridware/share/python/2.7.8/lib/python2.7/site-packages/numpy/lib/function_base.py:2531: RuntimeWarning: invalid value encountered in true_divide c /= stddev[None, :]

Rank Motif P-value P-value (Bonf) Coef NUTRs miRNAs Matched 1 TTTTTT nan nan nan 1716 [A1]miR-LQNS02278075-1-32324-5p 2 GTTTTT nan nan nan 1467
3 ATTTTT nan nan nan 1857
4 CTTTTT nan nan nan 1465
5 TGTTTT nan nan nan 1483 [A1]miR-bantam 6 GGTTTT nan nan nan 1033 [3]miR-981, [3]miR-981, [3]miR-981 7 AGTTTT nan nan nan 1613