Closed olgabot closed 6 years ago
I'm not familiar with pandas, but looking at the API I guess you would like have the dataframe column names to be converted to attribute names, and the matching column to a global attribute?
That seems like a useful enough convenience function.
On the other hand, loompy
is currently relatively bare-bones - we're not using pandas in the library, and I think it would be nice if we don't force it one people who don't use it.
What about making some kind of glue- or wrapper-library that adds this functionality, or monkey-patches it in?
@gioelelm, you also use pandas a lot, right?
Would it make sense to create a separate loom-pandas
package with is merely a wrapper around loompy
that handles conversions back and forth?
@JobLeonard I personally prefer to work directly with numpy, especially for big data to have more control on the efficiency of my matrix operations. However pandas
is a big deal in pydata community so it is reasonable to provide some function that bridge the two packages.
However I don't think it is worth to go as far as making another package. A method load_attrs_from_df
should do.
Maybe even doing this is an overkill, in fact, all it takes for an appropriate conversion from pandas to a col attr dictionary should be a single function (actually method) call.
>>> import pandas as pd
>>> df = pd.DataFrame({'col1': [1, 2,3], 'col2': [0.5, 0.75, 1]}, index=['a', 'b','c'])
>>> df.to_dict("list")
{'col1': [1, 2, 3], 'col2': [0.5, 0.75, 1.0]}
I am sure that @olgabot is referring to some more tricky situations, but then the question is how to predict all the possible scenarios since there is no standard pandas format for storing this kind of metadata.
Well, if the "base case" is that simple, we should probably include an example in the documentation - that is, a small section with "integrating with pandas"
A pandas-oriented tutorial would be good! But no special code unless there’s a very compelling case.
Sten
-- Sten Linnarsson, PhD Professor of Molecular Systems Biology Karolinska Institutet Unit of Molecular Neurobiology Department of Medical Biochemistry and Biophysics Scheeles väg 1, 171 77 Stockholm, Sweden<x-apple-data-detectors://1/0> +46 8 52 48 75 77<tel:+46%208%2052%2048%2075%2077> (office) +46 70 399 32 06<tel:+46%2070%20399%2032%2006> (mobile)
4 nov. 2017 kl. 16:10 skrev Job van der Zwan notifications@github.com<mailto:notifications@github.com>:
Well, if the "base case" is that simple, we should probably include an example in the documentation - that is, a small section with "integrating with pandas"
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/linnarsson-lab/loompy/issues/11#issuecomment-341904056, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKKagwQoqLwzFaCtR8jtqWoWvqT3WdJIks5szH5OgaJpZM4QOlsy.
I pushed a fix that does more extensive normalization of inputs during create() and set_attr(). You should now be able to pass list, tuple, np.ndarray, np.matrix or scipy.sparse, and the elements can be any kind of string, string object, or number. All will be normalized to conform to the spec.
You can now directly convert a pandas DataFrame to a row/col dictionary for create(), like @gioelelm suggested (but now it works):
col_attrs = df.to_dict("list")
Let me know if this is good enough.
All of my gene and cell metadata is stored as pandas dataframes and it's a pain to have to convert those to dictionaries every time for
loompy
. Canloompy
simply accept Pandas dataframes for these attributes?