mhorlbeck / ScreenProcessing

64 stars 32 forks source link

StopIteration error when running `process_experiments.py` #27

Open yeroslaviz opened 1 year ago

yeroslaviz commented 1 year ago

Hi Max,

I'm getting an error I can't pinpoint, when running the process_experiments.py script.

I have modified my config file according to my needs, I can see the count files and i can start creating the plots. When running the scripts though, it starts correctly:

$ python $process_experiments       P654_Human_experiment_config_file.txt       library_tables/

No growth values--all phenotypes will be reported as log2enrichments

Accessing library information
Loading counts data
Merging experiment counts split across lanes/indexes
-generating sgRNA read count histograms
/fs/pool/pool-cox-projects-bioinformatics/AG_Murray/Monika/P654/P654_screenProcessing/Human_ScreenProcessing/P654_Human_plots/000_fig_counts_hist.png
/fs/pool/pool-cox-projects-bioinformatics/AG_Murray/Monika/P654/P654_screenProcessing/Human_ScreenProcessing/P654_Human_plots/001_fig_counts_hist.png
/fs/pool/pool-cox-projects-bioinformatics/AG_Murray/Monika/P654/P654_screenProcessing/Human_ScreenProcessing/P654_Human_plots/002_fig_counts_hist.png
/fs/pool/pool-cox-projects-bioinformatics/AG_Murray/Monika/P654/P654_screenProcessing/Human_ScreenProcessing/P654_Human_plots/003_fig_counts_hist.png
/fs/pool/pool-cox-projects-bioinformatics/AG_Murray/Monika/P654/P654_screenProcessing/Human_ScreenProcessing/P654_Human_plots/004_fig_counts_hist.png
/fs/pool/pool-cox-projects-bioinformatics/AG_Murray/Monika/P654/P654_screenProcessing/Human_ScreenProcessing/P654_Human_plots/005_fig_counts_hist.png
Computing sgRNA phenotype scores
-generating phenotype histograms and scatter plots

but then I get the following error:

Traceback (most recent call last):
  File "/fs/pool/pool-cox-projects-bioinformatics/AG_Murray/Monika/ScreenProcessing-master/process_experiments.py", line 638, in <module>
    processExperimentsFromConfig(
  File "/fs/pool/pool-cox-projects-bioinformatics/AG_Murray/Monika/ScreenProcessing-master/process_experiments.py", line 163, in processExperimentsFromConfig
    screen_analysis.countsScatter(tempDataDict, condition1, replicate, condition2, replicate,
  File "/fs/gpfs41/lv02/fileset01/pool/pool-cox-projects-bioinformatics/AG_Murray/Monika/ScreenProcessing-master/screen_analysis.py", line 144, in countsScatter
    result = axis.scatter(np.log2(data['counts'].loc[:, (condition_x, replicate_x)] + 1),
  File "/fs/home/yeroslaviz/miniconda3/envs/screenProcessing/lib/python3.10/site-packages/matplotlib/__init__.py", line 1423, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/fs/home/yeroslaviz/miniconda3/envs/screenProcessing/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 4538, in scatter
    self._parse_scatter_color_args(
  File "/fs/home/yeroslaviz/miniconda3/envs/screenProcessing/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 4336, in _parse_scatter_color_args
    and isinstance(cbook._safe_first_finite(c), str)))
  File "/fs/home/yeroslaviz/miniconda3/envs/screenProcessing/lib/python3.10/site-packages/matplotlib/cbook/__init__.py", line 1749, in _safe_first_finite
    return next(val for val in obj if safe_isfinite(val))
StopIteration

It creates the first six plots (00 - 05) but not more.

I don't know if this has something to do with the fact, that in the config file i have this entry for the sgRNA Analysis:

###################################
##                     sgRNA Analysis                       ##
###################################
[sgrna_analysis]
condition_string =
#   gamma:MH:VH
    rho:MH:VH
#   tau:MH:VH

This is because I don't truly have samples with and without the vector, but in a way to different T0 groups I would like to compare the fractions between them. But this is not the main issue, the question is, can the script run with only one of the three parameters given, or do i have a different problem with my data I can't identify?

thanks for the information

Assa

yeroslaviz commented 1 year ago

ok, solved it.

the problem was with my library file listing the guides.

thanks

rosman83 commented 1 year ago

Getting this error as well -

line 144, in countsScatter
    result = axis.scatter(np.log2(data['counts'].loc[:, (condition_x, replicate_x)] + 1),
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

StopIteration

@yeroslaviz Could you explain in detail how you solved this - thanks?

mhorlbeck commented 1 year ago

I'm sorry I missed the first time-- @yeroslaviz I'm curious to here how you solved this as well. Which library file were you using?

My interpretation of the error message is that something non-finite is being passed to the color parsing in that scatter plot. I'm not able to reproduce this error though. Can you:

Thanks!

yeroslaviz commented 8 months ago

Hi Max,

I'm truly sorry it took me almost half a year to respond. Somehow I've missed the two comments to my issue. Now that I got a new data set and encounter again the same error, I came back here to see what I did. 🙈 I must admit, my comments was not really helpful and I apologize for that.

So here is the whole solution:

The error I get when the table with the guides is not correct is shown above. The problem is with the controls guides. I honestly don't know, what exactly is the problem, as the errors are very cryptic, but I was happy to solve it. The table with the guides looked like that originally:

sgID    sublibrary  gene    transcripts sequence
...
3-KO-2  Mouse   Ucp3    all XXXXXXXXXXXXXXXXXXX
3-KO-3  Mouse   Ucp3    all XXXXXXXXXXXXXXXXXXX
3-KO-4  Mouse   Ucp3    all XXXXXXXXXXXXXXXXXXX
Control-KO-1-F  Mouse   Control-KO-1-F  negative_control    XXXXXXXXXXXXXXXXXXX
Control-KO-2-F  Mouse   Control-KO-2-F  negative_control    XXXXXXXXXXXXXXXXXXX
Control-KO-3-F  Mouse   Control-KO-3-F  negative_control    XXXXXXXXXXXXXXXXXXX
Control-KO-4-F  Mouse   Control-KO-4-F  negative_control    XXXXXXXXXXXXXXXXXXX
Control-KO-5-F  Mouse   Control-KO-5-F  negative_control    XXXXXXXXXXXXXXXXXXX
...

And this cause the error. But, when I change the section of the control guides to this below it works.

sgID    sublibrary  gene    transcripts sequence
3-KO-2  Mouse   Ucp3    all XXXXXXXXXXXXXXXXXXX
3-KO-3  Mouse   Ucp3    all XXXXXXXXXXXXXXXXXXX
3-KO-4  Mouse   Ucp3    all XXXXXXXXXXXXXXXXXXX
Control-KO-1-F  Mouse   negative_control    na  XXXXXXXXXXXXXXXXXXX
Control-KO-2-F  Mouse   negative_control    na  XXXXXXXXXXXXXXXXXXX
Control-KO-3-F  Mouse   negative_control    na  XXXXXXXXXXXXXXXXXXX
Control-KO-4-F  Mouse   negative_control    na  XXXXXXXXXXXXXXXXXXX
Control-KO-5-F  Mouse   negative_control    na  XXXXXXXXXXXXXXXXXXX
...

I admit, I'm not sure, what the columns 3 and 4 represent in the process. Maybe you can update the instruction (README) so that the user will know what kind of format the input files must be for the scripts to work.

it would be great if you can explain it to me here, what happens in the script that cause such an error.

thanks

abearab commented 8 months ago

@yeroslaviz you may also try this https://github.com/ArcInstitute/ScreenPro2 😋

I just added all features to fully replicate ScreenProcessing pipeline. If you want to use that, I can help (README has some missing information but I can update it asap).

yeroslaviz commented 8 months ago

Thanks for that. I'll give it a go as soon as I have some time. Just a question in advance - can I use it in a conda env?

If I see it correctly, it runs within a python session? (not in bash/shell)

abearab commented 8 months ago

Thanks for that. I'll give it a go as soon as I have some time. Just a question in advance - can I use it in a conda env?

If I see it correctly, it runs within a python session? (not in bash/shell)

Yeah, ideally you need to install and use it within a conda environment. This file can help you make a suitable environment.

https://github.com/ArcInstitute/ScreenPro2/blob/master/environment.yml

rosman83 commented 4 months ago

@yeroslaviz you may also try this https://github.com/ArcInstitute/ScreenPro2 😋

I just added all features to fully replicate ScreenProcessing pipeline. If you want to use that, I can help (README has some missing information but I can update it asap).

I also did a recreation of the screen processing pipeline although not sure if everything is production yet. I added a GUI to use the tool but I need to develop it a bit further.

yeroslaviz commented 4 months ago

I also did a recreation of the screen processing pipeline although not sure if everything is production yet. I added a GUI to use the tool but I need to develop it a bit further.

Did you do it here or in ScreenPro2?