Open janosh opened 1 year ago
I'm working on this.
Hi @janosh sorry for the long silence.
I'm thinking about cleaning up the code a little bit (starting with density_scatter
and siblings, and then others if I have the energy).
What I want to do (in terms of dataset handling) is still allowing users to pass various types (ArrayLike
, pd.DataFrame
and more) into the plotter function, but internally convert them to a single data type (pd.DataFrame
maybe?), to get rid of data processing codes inside the plotter and potential incompatibility issues like this.
This should not be hard (just a simple utility script should do the job) nor breaking. Can I have the permission to implement this change? And do you have any suggestions on this matter? Thanks!
Hi @janosh sorry for the long silence.
no worries at all.
Can I have the permission to implement this change?
of course, that would be much appreciated! 🙏
And do you have any suggestions on this matter?
yes! have a look at
which is used in density_scatter
and many other plot functions to standardize input data
yes! have a look at
Thanks a lot @janosh. Maybe we would need to add a more general "to desired data type"
utility function (not just pd.DataFrame
to array). In this case, which do you prefer? pd.DataFrame
or np.ndarray
or both (I personally prefer np.ndarray
because it's more explicit and flat, and has efficient data processing support)? What do you think?
On top of all these technical things, I noticed despite all these beautiful plots, pymatviz
seems to lack its documentation site (maybe I failed to find it?). This API page seems more like an aggregation of docstrings to me? If so, I would be more than happy to help build one.
This API page seems more like an aggregation of docstrings to me? If so, I would be more than happy to help build one.
yes, that has been on my todo list for a long time. the current docs are terrible!
would be much nicer if we had a separate demo page for each plotting function with a few example invocations showing the corresponding figure. we could essentially copy all the cells in _generate_assets.py
into separate files and write some docs/explanation around it.
i'd prefer if not to use Jupyter notebooks for this as i find them more annoying to work with than Python scripts. luckily VS Code Interactive Window can run Python scripts just like notebooks and also supports exporting cells and their output to HTML. we just need to find a way to script this functionality so that the HTML is updated whenever source files change.
Maybe we would need to add a more general "to desired data type" utility function (not just pd.DataFrame to array)
i'm open to that. i'd prefer dataframes over arrays though as they have a more powerful API
would be much nicer if we had a separate demo page for each plotting function with a few example invocations showing the corresponding figure.
I'm thinking just building separate a docs site with sphinx
templates like "Read the Docs", I have some experience with that and I will start working on this soon (would open a separate PR aside from the data preprocessing topic )😄 .
i'd prefer if not to use Jupyter notebooks for this
I also find very long jupyter (like _generate_assets.py
) could be hard to navigate through. Would be much better to separate assert/example generation scripts for each section (histogram/ptable and such). I would push a new PR once I finish the draft and iterate together. Thanks!
i don't think we need to start from scratch and i'm not a huge fan of sphinx
tbh.
there are some preliminary example notebooks already up on the current docs page. e.g. see https://pymatviz.janosh.dev/notebooks/mp_bimodal_e_form.
you can get a list of them by hitting cmd + k to bring up the nav palette on the current docs. i think just converting those notebooks to python scripts and adding more of them would be great!
This simple example
embarrassingly raises