Closed m0nib closed 3 years ago
Hi,
Can you please tell me the output of the following commands:
ds.RNA.feats.fetch_all('I').sum()
and of this:
ds.cells.fetch_all('I').sum()
Also, if you print out ds
in a notebook cell, what do you see?
If you want to reset all cell level filtering, then this will do:
ds.cells.reset_key('I')
If you want to reset all the feature level filtering, then you need this:
ds.RNA.feats.reset_key('I')
Hi,
Can you please tell me the output of the following commands:
ds.RNA.feats.fetch_all('I').sum()
and of this:
ds.cells.fetch_all('I').sum()
Also, if you print out
ds
in a notebook cell, what do you see?
Hi Parashardapola,
output from these commands looks like this::
ds.RNA.feats.fetch_all("I").sum()
56490
time: 9.83 ms (started: 2021-09-30 09:41:48 +02:00)
ds.cells.fetch_all('I').sum()
973667
time: 13.1 ms (started: 2021-09-30 09:42:03 +02:00)
ds
DataStore has 973667 (982538) cells with 1 assays: RNA Cell metadata: 'I', 'ids', 'names', 'RNA_nCounts', 'RNA_nFeatures', 'RNA_percentMito', 'RNA_percentRibo', 'annotation.l1', 'annotation.l2', 'assay', 'assay_ontology_term_id', 'cell_type', 'cell_type_ontology_term_id', 'dataset_origin', 'development_stage', 'development_stage_ontology_term_id', 'disease', 'disease_ontology_term_id', 'donor', 'ethnicity', 'ethnicity_ontology_term_id', 'health_status', 'original_annotation', 'predicted.annotation.l1.score', 'predicted.annotation.l2.score', 'prediction_score_l1_bin', 'prediction_score_l2_bin', 'sex', 'tissue', 'tissue_ontology_term_id',
RNA assay has 56490 (77521) features and following metadata: 'I', 'ids', 'names', 'dropOuts', 'nCells', 'stats_I_avg', 'stats_I_c_var2000.1', 'stats_I_normed_n', 'stats_I_normed_tot', 'stats_I_nz_mean', 'stats_I_sigmas'
time: 16.7 ms (started: 2021-09-30 09:42:12 +02:00)
I will reset the keys and try again, as it failed again.
Best regards Monib
It seems like if you re-apply the filer_cells function cells and feats both retain the previous filter key in column "I" rather than first reset the keys and then apply the filter_cells.
HI,
I reset the cells and feats keys and ran only
ds.filter_cells(attrs=['RNA_percentMito'], highs=[15], lows=[0])
followed by mark_hvgs using
ds.mark_hvgs( min_cells=20, top_n=500, min_mean=3, max_mean=2, max_var=6 )
I am still getting the same error:
(RNA) Computing nCells: 100%| 257650/257650 [03:56] (RNA) Computing normed_tot: 100%| 257650/257650 [04:05] (RNA) Computing sigmas: 100%| 257650/257650 [04:47]
WARNING: WARNING: Number of valid features are less then value of parameter top_n
: 500. Resetting top_n
to 0
IndexError Traceback (most recent call last)
/tmp/ipykernel_40246/1850153007.py in
~/anaconda3/envs/scarf_env/lib/python3.8/site-packages/scarf/datastore.py in mark_hvgs(self, from_assay, cell_key, min_cells, top_n, min_var, max_var, min_mean, max_mean, n_bins, lowess_frac, blacklist, show_plot, hvg_key_name, **plot_kwargs) 3565 f"of cells will be considered HVGs." 3566 ) -> 3567 assay.mark_hvgs( 3568 cell_key, 3569 min_cells,
~/anaconda3/envs/scarf_env/lib/python3.8/site-packages/scarf/assay.py in mark_hvgs(self, cell_key, min_cells, top_n, min_var, max_var, min_mean, max_mean, n_bins, lowess_frac, blacklist, hvg_key_name, show_plot, **plot_kwargs) 932 top_n = n_valid_feats - 1 933 min_var = ( --> 934 pd.Series(self.feats.fetch_all(col_renamer(c_var_col)))[idx] 935 .sort_values(ascending=False) 936 .values[top_n]
IndexError: index -1 is out of bounds for axis 0 with size 0
Not sure what is happening here or what i am doing wrong.
Could you help with this.
Thank you
Best regards Monib
Hi @m0nib,
That's a correct observation. The filter_cells
function will not 'filter-in' cells that were previously 'filtered out'. This design decision was put in after a lot of thought. Sometimes the users perform a manual filtering step and then use filter_cells
. In such cases, automatically resetting the filtering is not a good idea.
You can always force reset the filter using reset_key
as I showed above.
Additionally the filter_cells
function has a parameter called reset_previous
. You can set this to True
to automatically reset any previous filtering. If you do this, then you don't need to manually use the reset_key
that I showed above.
Hi @m0nib,
There is a problem with your parameters in mark_hvgs
You are using a value for min_mean
that is greater than max_mean
(3>2). min_mean
should be lower than max_mean
. I would suggest that you don't use min_mean
parameter for starters and look at the results. Maybe paste the plot that you get after successfully running mark_hvgs
and then I can explain more.
Hi,
It worked now.
Thank you Best regards Monib
Great! I'll close this now
I loaded dataset as scarf.DataStore with the zarr directory then ran through the pipeline, i guess i made an error in the filter_cells function, now all the feats in column I are labelled False.
Therefore, when running mark_hvgs i keep getting an error:
ds.mark_hvgs( min_cells=20, top_n=500, min_mean=3, max_mean=2, max_var=6 )
Error: WARNING: WARNING: Number of valid features are less then value of parameter
top_n
: 500. Resettingtop_n
to 0IndexError Traceback (most recent call last) /tmp/ipykernel_32217/1850153007.py in
----> 1 ds.mark_hvgs(
2 min_cells=20,
3 top_n=500,
4 min_mean=3,
5 max_mean=2,
~/anaconda3/envs/scarf_env/lib/python3.8/site-packages/scarf/datastore.py in mark_hvgs(self, from_assay, cell_key, min_cells, top_n, min_var, max_var, min_mean, max_mean, n_bins, lowess_frac, blacklist, show_plot, hvg_key_name, **plot_kwargs) 3565 f"of cells will be considered HVGs." 3566 ) -> 3567 assay.mark_hvgs( 3568 cell_key, 3569 min_cells,
~/anaconda3/envs/scarf_env/lib/python3.8/site-packages/scarf/assay.py in mark_hvgs(self, cell_key, min_cells, top_n, min_var, max_var, min_mean, max_mean, n_bins, lowess_frac, blacklist, hvg_key_name, show_plot, **plot_kwargs) 932 top_n = n_valid_feats - 1 933 min_var = ( --> 934 pd.Series(self.feats.fetch_all(col_renamer(c_var_col)))[idx] 935 .sort_values(ascending=False) 936 .values[top_n]
IndexError: index -1 is out of bounds for axis 0 with size 0
Could you advise on how to reset the column I or do i have to restart the piepline from re-creating the zarr files?
Thank you best regards Monib