mfumagalli / ImaGene

Estimation of population genetic parameters using deep learning
GNU General Public License v3.0
26 stars 10 forks source link

List index out of range error in `01_binary.ipynb` #2

Closed mathemage closed 4 years ago

mathemage commented 4 years ago

The issue

I have run the binary tutorial up to this line:

https://github.com/mfumagalli/ImaGene/blob/9f1f127f3aeac7c63d013107cb4959ed1e40d4b3/Tutorials/01_binary.ipynb#L332

but it fails on index out of range error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/content/gdrive/My Drive/git-repos/ImaGene/ImaGene.py in <module>()
----> 1 gene_sim = file_sim.read_simulations(parameter_name='selection_coeff_hetero', max_nrepl=2000)

1 frames
/content/gdrive/My Drive/git-repos/ImaGene/ImaGene.py in __init__(self, data, positions, description, targets, parameter_name, classes)
    334         self.dimensions = (np.zeros(len(self.data)), np.zeros(len(self.data)))
    335         # initialise dimensions to the first image (in case we have only one)
--> 336         self.dimensions[0][0] = self.data[0].shape[0]
    337         self.dimensions[1][0] = self.data[0].shape[1]
    338         # if reads from real data, then stop here otherwise fill in all info on simulations

IndexError: list index out of range

What is wrong?

My configuration

The only thing I did before was changing all the file paths to paths of my system and creating the simulation folder explicitly:

simfolder = f'{base_dir}Binary/Simulations1'
%mkdir "$simfolder" -p

so that I call it as

file_sim = ImaFile(simulations_folder=simfolder, nr_samples=198, model_name='Marth-3epoch-CEU')
mathemage commented 4 years ago

@mfumagalli can you have a look at this, please?

mfumagalli commented 4 years ago

it works fine for me; did you simulate enough replicates? if less than 2000 as stated in "max_nrepl" then it may throw an error; try to change that parameter to a smaller value; let me know

mathemage commented 4 years ago

@mfumagalli

it works fine for me; Have you tried to run this notebook from a clean state (from a new environment/new computer)? From somewhere, where nothing has been done/generated before.

This notebook/tutorial should work for anyone no matter what, so better check all prerequisite steps are written down in instructions and all the preceding code/commands run are documented too.

did you simulate enough replicates? I have no idea what replicates mean or what you are talking about.

Notebook tutorials are supposed to be working for any user without any regards to their knowledge of the domain (biology). If there are any further steps that I should have done before this, it needs to be written down here.

if less than 2000 as stated in "max_nrepl" then it may throw an error; I didn't change anything in your setting/command, I left everything intact.

Nevertheless, the notebook does not work as-is, it is buggy even in its default form.

try to change that parameter to a smaller value; let me know If that's the case, it should be mentioned (some warning at least) or the value in the notebook should be set to a better one.

In such case, also, a more informative error message needs to be provided (more descriptive error, specialized exception...)

I tried your suggestion by running:

gene_sim = file_sim.read_simulations(parameter_name='selection_coeff_hetero', max_nrepl=20)

yet still the same error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/content/gdrive/My Drive/git-repos/ImaGene/ImaGene.py in <module>()
----> 1 gene_sim = file_sim.read_simulations(parameter_name='selection_coeff_hetero', max_nrepl=20)

1 frames
/content/gdrive/My Drive/git-repos/ImaGene/ImaGene.py in __init__(self, data, positions, description, targets, parameter_name, classes)
    334         self.dimensions = (np.zeros(len(self.data)), np.zeros(len(self.data)))
    335         # initialise dimensions to the first image (in case we have only one)
--> 336         self.dimensions[0][0] = self.data[0].shape[0]
    337         self.dimensions[1][0] = self.data[0].shape[1]
    338         # if reads from real data, then stop here otherwise fill in all info on simulations

IndexError: list index out of range

I provided my screenshots below...

Screenshots

Screenshots of local Jupyter notebook

Screenshot from 2020-03-18 18-48-36

Screenshots of the notebook on (remote) Google Colaboratory

Screenshot from 2020-03-18 18-49-58 Screenshot from 2020-03-18 18-50-15

mathemage commented 4 years ago

Also @mfumagalli please don't hardcode absolute paths/names/settings that would work only on your computer.

Specifically, I have to change /home/mfumagal/... paths everywhere every time. It is troubling for me and I believe it must be terribly frustrating for non-expert users. I would suggest at least to extract a path variable that can set/changed at the beginning of the notebook.

Notebooks should be runnable for anyone just as is and that way they should be provided by us.

mathemage commented 4 years ago

I suspect the Out of range error might be caused by the hardcoded absolute paths, as the simulations are generated in file paths with name mfumagal:

ImaGene_iss_3

Obviously, the subsequent code is unable to find those simulations in those paths as its current folder location is somewhere else (and the username is not mfumagal in general).

So I am trying to fix this in #3 . When that gets merged, I can continue in this issue and see if it would work by then.

mfumagalli commented 4 years ago

there was no hardcode; the paths to msms and folder to save are set when modifying the parameters file (along with all others parameters); this is explained in the instructions but now I made it clearer.