ratschlab / spladder

Tool for the detection and quantification of alternative splicing events from RNA-Seq data.
Other
103 stars 33 forks source link

local variable 'genes2' referenced before assignment #196

Open ls233 opened 7 months ago

ls233 commented 7 months ago

Description

I was trying to optimize a run a large cohort by using the "--chunked-merge" mechanism. I first generated the individual graphs. I then tried to merge using the "--chunked-merge" mechanism, which failed with an error listed below. The merge runs ok if I omit the "--chunked-merge" parameters, but since the cohort is too large I can't get away with this this.

What I Did

spladder build                          --parallel 1             -o /pathTo/spladder             -a /pathTo/genome.gtf             -b /pathTo/alignments.txt             --merge-strat merge_graphs             --no-extract-ase             --no-quantify-graph             --chunked-merge 0 1 0 2
merging level 0 chunk 0 to 0
Traceback (most recent call last):
  File "/pathTo/bin/spladder", line 8, in <module>
    sys.exit(main())
  File "/pathTo/lib/python3.8/site-packages/spladder/spladder.py", line 229, in main
    options.func(options)
  File "/pathTo/lib/python3.8/site-packages/spladder/spladder_build.py", line 88, in spladder
    run_merge(options.samples, options)
  File "/pathTo/lib/python3.8/site-packages/spladder/merge.py", line 253, in run_merge
    merge_genes_by_splicegraph(options, merge_list=merge_list[chunk_start:chunk_end], fn_out=fn)
  File "/pathTo/lib/python3.8/site-packages/spladder/merge.py", line 203, in merge_genes_by_splicegraph
    genes = genes2.copy()
UnboundLocalError: local variable 'genes2' referenced before assignment
akahles commented 7 months ago

Dear @ls233 ,

Thanks for reaching out. How many samples are you working on? I have written a short description of how to use the chunked merge in #136 . The issue might stem from the fact that levels are 1-based. That is, if we assume you have 16 samples, you could merge them in two levels with chunksize 4. Thus, you would call SplAdder merge 5 times in total:

... --chunksize 4 --chunked-merge 1 2 0 4 ...
... --chunksize 4 --chunked-merge 1 2 4 8 ...
... --chunksize 4 --chunked-merge 1 2 8 12 ...
... --chunksize 4 --chunked-merge 1 2 12 16 ...
... --chunksize 4 --chunked-merge 2 2 0 4 ...

The second level can only be started once the first level runs are completed. I omitted all other options that are irrelevant for chunking.

Let me know how it goes.

Cheers,

Andre

ls233 commented 5 months ago

thanks Andre, indeed moving from 0- to 1-based levels help. It's just that on the other thread you seem to have suggested that the level were 0-based, therefore, thanks for the clarifications.

I'm working with about 1800 samples and now, with the merge step in the back mirror, I'm struggling with the last step, which is calling the AS events. After running for a few days the runs are just dying off, producing no error message.

akahles commented 5 months ago

Thanks for getting back. If the graphs get very complex, calling events can be a bit of a struggle. We had a few ideas on how to improve this, but did not get a chance yet to try them out. Anyways, in the meantime, I can suggest a few workarounds:

ls233 commented 5 months ago

thanks Andre

  1. call event types independently via --event-types and run in parallel - we had it implemented already as part of our snakemake pipeline
  2. skip overly complex genes via --ase-edge-limit - what is that lower value you'd recommend we use?
  3. make sure numba is installed - we already had installed