toolsforexperiments / plottr

A flexible plotting and data analysis tool.
https://github.com/toolsforexperiments/plottr
MIT License
46 stars 55 forks source link

Feature: Find datadicts matching a set of conditions #379

Closed yoshi74ls181 closed 1 year ago

yoshi74ls181 commented 1 year ago

This pull request adds a method plottr.data.datadict_storage.search_datadicts, which returns an iterator over datadicts matching a set of conditions. The following conditions are currently supported:

For convenience, I've also added a method plottr.data.datadict_storage.search_datadict, which asserts that there is only one matching datadict.

yoshi74ls181 commented 1 year ago

Resolved a merge conflict with #375.

marcosfrenkel commented 1 year ago

I really like this feature! But at the moment if the search encounters any invalid data (the writer always creates a file even if the nothing is inside of it) the whole search fails. Because of this, it is hard to test on my end.

I am also a little unsure if its a good idea that the search_datadicts returns the generator instead of a list with all the matching datadicts. It is a good idea to have the generator since the datadicts might be big, but having both the generators and a function that returns a list might be a good idea too and shouldn't take much effort. @wpfff what do you think?

yoshi74ls181 commented 1 year ago

Thanks! I think I've resolved the error you encountered by fixing a bug in datadict_from_hdf5. Could you test this again?

yoshi74ls181 commented 1 year ago

Added the following search conditions:

marcosfrenkel commented 1 year ago

Hello sorry for the late response, its been a busy couple of weeks.

I remember being able to test this but no matter how I try now the generator is always empty. @yoshi74ls181 could you give me an example of how it is supposed to be used?

yoshi74ls181 commented 1 year ago

No worries! Sorry about flooding you with many pull requests recently, I don't mean to rush you at all.

Here's a usage example:

from plottr.data.datadict_storage import DataDict, DDH5Writer, search_datadicts, search_datadict

basedir = "C:\\plottr-data"

# create two datasets
data = DataDict(x=dict(), y=dict(axes=["x"]))
with DDH5Writer(data, basedir, name="test") as writer:
    writer.add_data(x=[1, 2, 3], y=[1, 2, 3])
data = DataDict(x=dict(), y=dict(axes=["x"]))
with DDH5Writer(data, basedir, name="test") as writer:
    writer.add_data(x=[1, 2, 3], y=[3, 2, 1])

# print all datasets named "test" from today
for foldername, datadict in search_datadicts(basedir, "2023-03-17", name="test"):
    print(foldername, datadict["x"]["values"], datadict["y"]["values"])

# print just the newest one
foldername, datadict = search_datadict(basedir, "2023-03-17", name="test", newest=True)
print(foldername, datadict["x"]["values"], datadict["y"]["values"])

# print the one with specific date and time
foldername, datadict = search_datadict(basedir, "2023-03-17T200540", name="test")
print(foldername, datadict["x"]["values"], datadict["y"]["values"])
wpfff commented 1 year ago

@yoshi74ls181 off-topic, but i couldn't find a way to message you in a different way :) it was great meeting you at the APS meeting! could you maybe let me know your email address? (you can email me directly at wpfaff at illinois dot edu)

yoshi74ls181 commented 1 year ago

@wpfff Have you received my email? I'm worried that it might have ended up in your spam folder because I sent it from my personal gmail account (I lost access to my university email when I graduated). No worries if it's just that you've been busy.

wpfff commented 1 year ago

this function is useful, and we have a similar one in our lab code -- but i'm not sure it should be part of plottr itself. there's a few conceptual issues:

we're currently thinking on how to filter better in monitr, but we're not sure yet on the correct approach. I'm closing this for now, and we can re-open if needed.