Open agoscinski opened 5 months ago
I'm not really sold on something like this ... From the example, I feel it makes reading the code a lot harder: as a new contributor if I see something like random_uniform_tensor(2, (5,2), 1)
I have no idea what is happening.
Is the intended use case to make writing tests easier? Or to provide a new operation for everyone to use? In both cases, we already have block_from_array
, which is not used that much in the existing tests but is already intended as an easier way to wrap data in metatensor.
We could provide a tensor_from_arrays(keys: Labels, blocks: List[Array]) -> TensorMap
that would do something similar to your suggestion, but I doubt it would be used a lot:
for tests we need to add more edge cases with samples present in some but not all blocks, etc.
Yes but that also means that there is some generic constructions of a tensor map that every test seems to need. In every subpackage we have a utils.py
in with some arbitrary complicated tensor map definition that is documented with "dummy tensor map". The test-specific edge case constructions are usually in the test file itself. Something with gradients is needed that can also used for all subpackages as base test.
I am also okay for now to use always one of the stored tensor map ./python/metatensor-operations/tests/data/qm7-power-spectrum.npz
, but the utility to load them in a dispatched way is also only in the metatensor-torch subpackage, so then I need to copy this to metatensor-learn. By having something like this in operations, every subpackage can use it and code redundancy is extremely reduced.
for users putting their data in metatensor, they will want more control over the samples and components names & values
The current from_array
method and the one you suggested don't include gradients and I have found myself several times spending time in constructing a tensor map because I wanted to play around with gradients. I think at least one utility operation should simplify the creation of a tensor map with gradients. I am open for other suggestions, but the tensor_from_array
does not solve it.
So my main point is that if this is only intended for tests, we should not put it in the main API. I'd be fine with something like metatensor.operations.test_utils.random_tensor
(where one would have to explicitly import metatensor.operations.test_utils
to access it). I'm not convinced this will be used by a lot of tests, but I'm happy to give it a go!
I feel like this would be plenty for the API:
def random_tensor(
*, # force all arguments to be keywords args
n_blocks: int,
shape: Optional[List[int]] = None, # defaults to 8x3 or something
gradients: Optional[List[str]] = None,
) -> TensorMap
I think device
and dtype
should be handled by metatensor.to()
.
The goal is to reduce duplicate code in the tests due to the creation of tensor maps, for example in
*_like
operations.I also thought about more flexibility, but I think supporting multi column labels make it more complicated without much gain in usage. If you need labels with more than one column you usually have a specific use case in mind and need to specify the exact Labels.
Do you think such an operation would be useful in the first suggested form?