Closed catsargent closed 1 year ago
Hi, could you please paste the whole error message here? Thanks!
On Mon, Feb 13, 2023, 06:17 Catherine Sargent @.***> wrote:
Hi,
I am keen to use swanvis to explore the results of running TALON on our single cell dataset. Unfortunately, when reading in the filtered abundance information into the swan graph, I get a memory error. [image: Screenshot 2023-02-13 at 14 28 52] https://user-images.githubusercontent.com/38214629/218480192-36ddfca9-9e8f-4288-b699-db0acd65a1c9.png
We have 9992 cells and 17,808 transcripts. The error is due to trying to create an array with shape (9992, 3033596). I am not sure what the 3033596 refers to. I tried increasing the memory allocation to 200Gb on the HPC but it still fails. Increasing the memory further does not work as I am not granted the resources for my job on the HPC. Do you have any suggestions about how to get around this?
Many thanks, Catherine
— Reply to this email directly, view it on GitHub https://github.com/mortazavilab/swan_vis/issues/23, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBGIWPNABG3ULUQWLKBUADWXI67HANCNFSM6AAAAAAU2KHXQI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Sure. First of all there were lots of these warnings:
df[total_col] = df[c].sum()
/projects/b1177/pythonenvs/scanpy-env/lib/python3.9/site-packages/swan_vis/utils.py:456: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
df[cond_col] = (df[c]*1000000)/df[total_col]
And then this error:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
Cell In[16], line 2
1 # add each dataset's abundance information to the SwanGraph
----> 2 sg.add_abundance(ab_file)
File /projects/b1177/pythonenvs/scanpy-env/lib/python3.9/site-packages/swan_vis/swangraph.py:346, in SwanGraph.add_abundance(self, counts_file)
343 self.adata.layers['pi'] = calc_pi(self.adata, self.t_df)[0].to_numpy()
345 # add abundance for edges, TSS per gene, and TES per gene
--> 346 self.create_edge_adata()
347 self.create_end_adata(kind='tss')
348 self.create_end_adata(kind='tes')
File /projects/b1177/pythonenvs/scanpy-env/lib/python3.9/site-packages/swan_vis/swangraph.py:512, in SwanGraph.create_edge_adata(self)
509 t_exp_df = pd.DataFrame(columns=obs, data=data, index=tid)
511 # merge counts per transcript with edges
--> 512 edge_exp_df = edge_exp_df.merge(t_exp_df, how='left',
513 left_index=True, right_index=True)
515 # sum the counts per transcript / edge / dataset
516 edge_exp_df = edge_exp_df.groupby('edge_id').sum()
File /projects/b1177/pythonenvs/scanpy-env/lib/python3.9/site-packages/pandas/core/frame.py:10090, in DataFrame.merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
10071 @Substitution("")
10072 @Appender(_merge_doc, indents=2)
10073 def merge(
(...)
10086 validate: str | None = None,
10087 ) -> DataFrame:
10088 from pandas.core.reshape.merge import merge
> 10090 return merge(
10091 self,
10092 right,
10093 how=how,
10094 on=on,
10095 left_on=left_on,
10096 right_on=right_on,
10097 left_index=left_index,
10098 right_index=right_index,
10099 sort=sort,
10100 suffixes=suffixes,
10101 copy=copy,
10102 indicator=indicator,
10103 validate=validate,
10104 )
File /projects/b1177/pythonenvs/scanpy-env/lib/python3.9/site-packages/pandas/core/reshape/merge.py:124, in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
93 @Substitution("\nleft : DataFrame or named Series")
94 @Appender(_merge_doc, indents=0)
95 def merge(
(...)
108 validate: str | None = None,
109 ) -> DataFrame:
110 op = _MergeOperation(
111 left,
112 right,
(...)
122 validate=validate,
123 )
--> 124 return op.get_result(copy=copy)
File /projects/b1177/pythonenvs/scanpy-env/lib/python3.9/site-packages/pandas/core/reshape/merge.py:775, in _MergeOperation.get_result(self, copy)
771 self.left, self.right = self._indicator_pre_merge(self.left, self.right)
773 join_index, left_indexer, right_indexer = self._get_join_info()
--> 775 result = self._reindex_and_concat(
776 join_index, left_indexer, right_indexer, copy=copy
777 )
778 result = result.__finalize__(self, method=self._merge_type)
780 if self.indicator:
File /projects/b1177/pythonenvs/scanpy-env/lib/python3.9/site-packages/pandas/core/reshape/merge.py:766, in _MergeOperation._reindex_and_concat(self, join_index, left_indexer, right_indexer, copy)
764 left.columns = llabels
765 right.columns = rlabels
--> 766 result = concat([left, right], axis=1, copy=copy)
767 return result
File /projects/b1177/pythonenvs/scanpy-env/lib/python3.9/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
325 if len(args) > num_allow_args:
326 warnings.warn(
327 msg.format(arguments=_format_argument_list(allow_args)),
328 FutureWarning,
329 stacklevel=find_stack_level(),
330 )
--> 331 return func(*args, **kwargs)
File /projects/b1177/pythonenvs/scanpy-env/lib/python3.9/site-packages/pandas/core/reshape/concat.py:381, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
159 """
160 Concatenate pandas objects along a particular axis.
161
(...)
366 1 3 4
367 """
368 op = _Concatenator(
369 objs,
370 axis=axis,
(...)
378 sort=sort,
379 )
--> 381 return op.get_result()
File /projects/b1177/pythonenvs/scanpy-env/lib/python3.9/site-packages/pandas/core/reshape/concat.py:616, in _Concatenator.get_result(self)
612 indexers[ax] = obj_labels.get_indexer(new_labels)
614 mgrs_indexers.append((obj._mgr, indexers))
--> 616 new_data = concatenate_managers(
617 mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy
618 )
619 if not self.copy:
620 new_data._consolidate_inplace()
File /projects/b1177/pythonenvs/scanpy-env/lib/python3.9/site-packages/pandas/core/internals/concat.py:212, in concatenate_managers(mgrs_indexers, axes, concat_axis, copy)
210 values = blk.values
211 if copy:
--> 212 values = values.copy()
213 else:
214 values = values.view()
MemoryError: Unable to allocate 113. GiB for an array with shape (9992, 3033596) and data type float32
Thanks for the quick response!
I'll see if there are any things I can do to decrease the memory usage at this step but in the meantime, have you filtered your cells and transcripts down to the final set that you're planning on using for the analysis? I definitely recommend doing this before using Swan!
I had filtered down to the set of cells passing QC in the SR dataset and then had done the filtering of transcripts as mentioned in your paper i.e. for unknown transcripts, 1 count in minimum 4 cells and not flagged as internal priming.
To get around the memory issue, I have now filtered to just two genes and respective transcripts that we're particularly interested in.
Hi there,
Sorry it has taken me so long to respond. I've added a few new initialization options that might help your problem. By default, Swan generates expression matrices for your transcripts as well as TSSs, TESs, and individual exons, the last of which as you might imagine ends up having a lot of features.
The options I have added will turn off the creation of these expression matrices, if you don't need them, run your SwanGraph initialization as
sg = swan.SwanGraph(sc=True, edge_adata=False, end_adata=False)
.
You'll have to install from the latest commits. Let me know you are able to give this a try!
Hi,
I am keen to use swanvis to explore the results of running TALON on our single cell dataset. Unfortunately, when reading in the filtered abundance information into the swan graph, I get a memory error.
We have 9992 cells and 17,808 transcripts. The error is due to trying to create
an array with shape (9992, 3033596)
. I am not sure what the 3033596 refers to. I tried increasing the memory allocation to 200Gb on the HPC but it still fails. Increasing the memory further does not work as I am not granted the resources for my job on the HPC. Do you have any suggestions about how to get around this?Many thanks, Catherine