linnarsson-lab / adolescent-mouse

Analysis pipeline for the adolescent mouse nervous system project
24 stars 5 forks source link

Error from AggregateL1 #3

Open brianherb opened 6 years ago

brianherb commented 6 years ago

I'm working through the adolescent-mouse pipeline and encountering a problem with the AggregateL1 function. Everything up to ClusterL1 appears to work, however the issue might be due to the fact that I did not prepare a classifier.pickle, nor a classified.loom file that train_classifier.py requires. If these files are indeed required, could you describe what they should contain? At this point, I am starting with a .loom file per replicate, a pooling_specification.tab file and a metadata.xlsx file, but that is it.

Error:

/anaconda3/lib/python3.6/site-packages/scipy/stats/stats.py:245: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored. "values. nan values will be ignored.", RuntimeWarning) ERROR: [pid 21613] Worker Worker(salt=760780744, workers=1, host=bherblt-osx.som.umaryland.edu, username=bherb, pid=21613) failed AggregateL1(tissue=E10, n_auto_genes=6)

Thank you, Brian

slinnarsson commented 6 years ago

Hi

That's just a warning, and can be safely ignored. It's just numpy telling you that some array (due to its dtype) could not be checked for nans, but this doesn't mean it is an error.

Do you not get an output? AggregateL1 should yield a file that ends in .agg.loom. Next step will be to run ExportL1 to get the plots and such.

Running without the classifier should still be possible, it will just not assign the Class and Subclass attributes.

brianherb commented 6 years ago

Thank you for the reply. I do not get a .agg.loom file, and my L1_E10.agg.loom.log file consists of:

2018-07-30 10:39:57,956 INFO: tissue = E10 2018-07-30 10:39:57,956 INFO: n_auto_genes = 6 2018-07-30 10:39:57,956 INFO: ===

I'm copying the entire output from the call:

luigi --local-scheduler --module adolescent_mouse AggregateL1 --tissue E10 --paths-samples /local/projects/idea/bherb/HRP/adolescent_mouse/samples/ --paths-build /local/projects/idea/bherb/HRP/adolescent_mouse/results/

to see if I'm missing something. I believe that ClusterL1 ran successfully.

Output:

/home/bherb/anaconda3/lib/python3.6/site-packages/luigi/parameter.py:261: UserWarning: Parameter "task_process_context" with value "None" is not of type string. warnings.warn('Parameter "{}" with value "{}" is not of type string.'.format(param_name, param_value)) DEBUG: Checking if AggregateL1(tissue=E10, n_auto_genes=6) is complete 2018-07-30 11:48:17,161 DEBUG: Checking if AggregateL1(tissue=E10, n_auto_genes=6) is complete /home/bherb/anaconda3/lib/python3.6/site-packages/luigi/parameter.py:261: UserWarning: Parameter "filter_cellcycle" with value "None" is not of type string. warnings.warn('Parameter "{}" with value "{}" is not of type string.'.format(param_name, param_value)) /home/bherb/anaconda3/lib/python3.6/site-packages/luigi/parameter.py:261: UserWarning: Parameter "layer" with value "None" is not of type string. warnings.warn('Parameter "{}" with value "{}" is not of type string.'.format(param_name, param_value)) DEBUG: Checking if ClusterL1(tissue=E10, n_genes=1000, gtsne=True, alpha=1, filter_cellcycle=None, layer=None) is complete 2018-07-30 11:48:17,167 DEBUG: Checking if ClusterL1(tissue=E10, n_genes=1000, gtsne=True, alpha=1, filter_cellcycle=None, layer=None) is complete INFO: Informed scheduler that task AggregateL1_6_E10_dfc28be60a has status PENDING 2018-07-30 11:48:17,171 INFO: Informed scheduler that task AggregateL1_6_E10_dfc28be60a has status PENDING INFO: Informed scheduler that task ClusterL1_1_None_True_c6ea1039c0 has status DONE 2018-07-30 11:48:17,171 INFO: Informed scheduler that task ClusterL1_1_None_True_c6ea1039c0 has status DONE INFO: Done scheduling tasks 2018-07-30 11:48:17,171 INFO: Done scheduling tasks INFO: Running Worker with 1 processes 2018-07-30 11:48:17,171 INFO: Running Worker with 1 processes DEBUG: Asking scheduler for work... 2018-07-30 11:48:17,172 DEBUG: Asking scheduler for work... 2018-07-30 11:48:17,172 DEBUG: Starting pruning of task graph 2018-07-30 11:48:17,173 DEBUG: Done pruning task graph DEBUG: Pending tasks: 1 2018-07-30 11:48:17,173 DEBUG: Pending tasks: 1 INFO: [pid 121710] Worker Worker(salt=981881326, workers=1, host=galactus.igs.umaryland.edu, username=bherb, pid=121710) running AggregateL1(tissue=E10, n_auto_genes=6) 2018-07-30 11:48:17,173 INFO: [pid 121710] Worker Worker(salt=981881326, workers=1, host=galactus.igs.umaryland.edu, username=bherb, pid=121710) running AggregateL1(tissue=E10, n_auto_genes=6) 2018-07-30 11:48:17,184 INFO: tissue = E10 2018-07-30 11:48:17,184 INFO: n_auto_genes = 6 2018-07-30 11:48:17,184 INFO: === 2018-07-30 11:48:17,277 INFO: Aggregating clusters by mean /home/bherb/anaconda3/lib/python3.6/site-packages/scipy/stats/stats.py:245: RuntimeWarning: The input array could not be properly checked for nan values. nan values will be ignored. "values. nan values will be ignored.", RuntimeWarning) ERROR: [pid 121710] Worker Worker(salt=981881326, workers=1, host=galactus.igs.umaryland.edu, username=bherb, pid=121710) failed AggregateL1(tissue=E10, n_auto_genes=6) Traceback (most recent call last): File "/home/bherb/anaconda3/lib/python3.6/site-packages/luigi/worker.py", line 205, in run new_deps = self._run_get_new_deps() File "/home/bherb/anaconda3/lib/python3.6/site-packages/luigi/worker.py", line 142, in _run_get_new_deps task_gen = self.task.run() File "/home/bherb/software/adolescent-mouse/adolescent_mouse/adolescent_L1/aggregate_L1.py", line 33, in run cg.Aggregator().aggregate(ds, out_file) File "/home/bherb/anaconda3/lib/python3.6/site-packages/cytograph-0.6.1-py3.6.egg/cytograph/aggregator.py", line 42, in aggregate cg.aggregate_loom(ds, out_file, None, "Clusters", "mean", agg_spec) File "/home/bherb/anaconda3/lib/python3.6/site-packages/cytograph-0.6.1-py3.6.egg/cytograph/aggregator.py", line 147, in aggregate_loom ca[key] = npg.aggregate(zero_strt_sort_noholes_lbls, ds.col_attrs[key], func=mode, fill_value=0).astype('str') File "/home/bherb/anaconda3/lib/python3.6/site-packages/numpy_groupies-0+unknown-py3.6.egg/numpy_groupies/aggregate_numpy.py", line 288, in aggregate _impl_dict=_impl_dict, _nansqueeze=True, kwargs) File "/home/bherb/anaconda3/lib/python3.6/site-packages/numpy_groupies-0+unknown-py3.6.egg/numpy_groupies/aggregate_numpy.py", line 262, in _aggregate_base dtype=dtype, kwargs) File "/home/bherb/anaconda3/lib/python3.6/site-packages/numpy_groupies-0+unknown-py3.6.egg/numpy_groupies/aggregate_numpy.py", line 212, in _generic_callable ret[i] = func(grp) ValueError: could not convert string to float: 'Unknown' 2018-07-30 11:48:17,337 ERROR: [pid 121710] Worker Worker(salt=981881326, workers=1, host=galactus.igs.umaryland.edu, username=bherb, pid=121710) failed AggregateL1(tissue=E10, n_auto_genes=6) Traceback (most recent call last): File "/home/bherb/anaconda3/lib/python3.6/site-packages/luigi/worker.py", line 205, in run new_deps = self._run_get_new_deps() File "/home/bherb/anaconda3/lib/python3.6/site-packages/luigi/worker.py", line 142, in _run_get_new_deps task_gen = self.task.run() File "/home/bherb/software/adolescent-mouse/adolescent_mouse/adolescent_L1/aggregate_L1.py", line 33, in run cg.Aggregator().aggregate(ds, out_file) File "/home/bherb/anaconda3/lib/python3.6/site-packages/cytograph-0.6.1-py3.6.egg/cytograph/aggregator.py", line 42, in aggregate cg.aggregate_loom(ds, out_file, None, "Clusters", "mean", agg_spec) File "/home/bherb/anaconda3/lib/python3.6/site-packages/cytograph-0.6.1-py3.6.egg/cytograph/aggregator.py", line 147, in aggregate_loom ca[key] = npg.aggregate(zero_strt_sort_noholes_lbls, ds.col_attrs[key], func=mode, fill_value=0).astype('str') File "/home/bherb/anaconda3/lib/python3.6/site-packages/numpy_groupies-0+unknown-py3.6.egg/numpy_groupies/aggregate_numpy.py", line 288, in aggregate _impl_dict=_impl_dict, _nansqueeze=True, kwargs) File "/home/bherb/anaconda3/lib/python3.6/site-packages/numpy_groupies-0+unknown-py3.6.egg/numpy_groupies/aggregate_numpy.py", line 262, in _aggregate_base dtype=dtype, kwargs) File "/home/bherb/anaconda3/lib/python3.6/site-packages/numpy_groupies-0+unknown-py3.6.egg/numpy_groupies/aggregate_numpy.py", line 212, in _generic_callable ret[i] = func(grp) ValueError: could not convert string to float: 'Unknown' DEBUG: 1 running tasks, waiting for next task to finish 2018-07-30 11:48:17,382 DEBUG: 1 running tasks, waiting for next task to finish 2018-07-30 11:48:17,387 DEBUG: AggregateL1_6_E10_dfc28be60a task num failures is 1 and limit is 999999999 INFO: Informed scheduler that task AggregateL1_6_E10_dfc28be60a has status FAILED 2018-07-30 11:48:17,387 INFO: Informed scheduler that task AggregateL1_6_E10_dfc28be60a has status FAILED DEBUG: Asking scheduler for work... 2018-07-30 11:48:17,387 DEBUG: Asking scheduler for work... 2018-07-30 11:48:17,387 DEBUG: Starting pruning of task graph 2018-07-30 11:48:17,387 DEBUG: Done pruning task graph DEBUG: Done 2018-07-30 11:48:17,387 DEBUG: Done DEBUG: There are no more tasks to run at this time 2018-07-30 11:48:17,387 DEBUG: There are no more tasks to run at this time DEBUG: There are 1 pending tasks possibly being run by other workers 2018-07-30 11:48:17,387 DEBUG: There are 1 pending tasks possibly being run by other workers DEBUG: There are 1 pending tasks unique to this worker 2018-07-30 11:48:17,388 DEBUG: There are 1 pending tasks unique to this worker DEBUG: There are 1 pending tasks last scheduled by this worker 2018-07-30 11:48:17,388 DEBUG: There are 1 pending tasks last scheduled by this worker INFO: Worker Worker(salt=981881326, workers=1, host=galactus.igs.umaryland.edu, username=bherb, pid=121710) was stopped. Shutting down Keep-Alive thread 2018-07-30 11:48:17,388 INFO: Worker Worker(salt=981881326, workers=1, host=galactus.igs.umaryland.edu, username=bherb, pid=121710) was stopped. Shutting down Keep-Alive thread INFO: ===== Luigi Execution Summary =====

Scheduled 2 tasks of which:

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====

2018-07-30 11:48:17,391 INFO: ===== Luigi Execution Summary =====

Scheduled 2 tasks of which:

This progress looks :( because there were failed tasks

===== Luigi Execution Summary =====

slinnarsson commented 6 years ago

Probably some attribute is missing. If you don't provide an agg_spec argument to Aggregator().aggregate(), it defaults to

agg_spec = {
    "Age": "tally",
    "Clusters": "first",
    "Class": "mode",
    "_Total": "mean",
    "Sex": "tally",
    "Tissue": "tally",
    "SampleID": "tally",
    "TissuePool": "first",
    "Outliers": "mean"
}

You can supply your own agg_spec to indicate which of your attributes you want to aggregate and how.

The code for cytograph and adolescent_mouse is currently quite specific to our particular project, and it will take some effort to make them work for a different project. We provide the code in the interest of transparency, but it is not designed or meant for general use. I'm working on extracting some of the more useful parts into a standalone library suitable for general use, but this will take some time.