materialsproject / fireworks

The Fireworks Workflow Management Repo.
https://materialsproject.github.io/fireworks
Other
361 stars 185 forks source link

Extended get files by query task #384

Closed jotelha closed 4 years ago

jotelha commented 4 years ago

A suggestion:

Turns the previously contributed "GetFilesByQueryTask" into something useful:

Other additions:

Best regards,

Johannes

computron commented 4 years ago

Thanks! Can you clarify the reason for switching from dot to arrow? You wrote "allows for nested queries to be stored" but I don't think I understood why a nested query can't be stored if there is a "." in it.

jotelha commented 4 years ago

Queries are expected as nested dicts and not as plain strings, thus the -> aliasing for the . (dot) separator allows to store queries like this:

ft = GetFilesByQueryTask(
    query={
        'metadata->project':    project_id,
        'metadata->type':       'surfactant_file',
    },
    sort_key='metadata.datetime',
    sort_direction=pymongo.DESCENDING,
    limit=1,
    new_file_names=['default.pdb'])]

If I remember correctly, the MongoDB language does not allow for dots in keys. It's the same issue as in the dict_mods.py file at https://github.com/materialsproject/fireworks/blob/07bace776fedefd09907272334a2c5925ffce51d/fireworks/utilities/dict_mods.py#L55-L58.

computron commented 4 years ago

Ah yes, I remember - MongoDB doesn't allow storing dictionaries where the keys have a dot in them. So storing the parameter:

{"query": {"key.subkey": "value"}}

can't be done - making it difficult to serialize the FireTask. The arrows should indeed make it possible to store the query and thereby serialize the Firetask. Merging this now along with the other improvements, thanks!

jotelha commented 4 years ago

Exactly. A related note: Similarly, I believe, it is not possible to store any query involving $-prefixed operators, i.e.

{'metadata.datetime': {'$gt': '2020'} }

thus it might be a good idea to store queries as plain strings instead. Are there any mongo-language-specific serialization recommendations for query documents?

computron commented 4 years ago

I don't know of any mongo language specific serialization recommendations; it's possible that a simple string is best.

As an aside, it looks like as of MongoDB 3.6+, dots are allowed in key names. But dollar sign prefixes are still prohibited:

https://docs.mongodb.com/manual/reference/limits/#Restrictions-on-Field-Names