Open JustinShenk opened 3 years ago
@Saran-nns was the current dataset.py written by you? Do you mind if it is hacked up to output a dataframe instead of Torch tensor?
Apart from what mentioned above, dataset.py
at PR #26 contains additional functions to prepare the data loaders. It burrows several utility functions from datasets.utils
to extract and preprocess the data. So, I guess it is convenient to setup a new helper function at datasets.utils to create traja dataframe from the csv or available datasets, then it could be called inside datasets.utils.generate_dataset(df,n_past, n_future)
At the moment,generate_dataset(df,n_past, n_future)
at datasets.utils
receives pd.dataframe
as input and return tensors of train and test time-series datasets along with corresponding categories(IDs) which are then fed into dataloaders.
So we expect a separate utility function for available dataset as,
def load_data(dataset:str):
#Precheck
try:
dataset = traja.datasets.utils.load_data(dataset) # read csv file using pandas
except:
raise exception(f'{dataset}' "is not in" f'list(traja.datasets.utils.available())')
# Load the data
df = pd.read_csv(dataset)
return traja.dataframe(df)
Once this is done, we can easily set traja dataframe as default data format by replacing isinstance(pd.DataFrame)
to isinstance(traja.dataframe)
inside traja.datasets.utils.generate_dataset()
@justinshenk the current handling is intended to be a middle ground between Torch and Pandas. The neural networks require time series and just about nothing else does, so time series are handled as tensors. However, I agree that the networks should output dataframes when they are 'done' so things can interoperate with the rest of Traja. I am just a bit unclear on the finer details of this interface.
We haven't added the functions for post-training predictions/inferences yet. I will update Trainer to return the network prediction on the test dataset as traja data frame.
@WolfByttner I am preparing the UML diagram for traja commit #26 . That might easily guide collaborators
his (rather huge) Mallard dataset has temperature, as a possible regression parameter: https://www.movebank.org/cms/webapp?gwt_fragment=page=studies,path=study3109235
You also have geese here (with temps - slightly less volatile such): https://www.movebank.org/cms/webapp?gwt_fragment=page=studies,path=study83912796
https://www.movebank.org/cms/webapp?gwt_fragment=page=studies,path=study577905925 - This dataset has genders and temporal classes. Very interesting
https://www.movebank.org/cms/webapp?gwt_fragment=page=studies,path=study933711994
Enable loading trajectory datasets via Traja API:
An early attempt, designed for Pedestrian datasets (hence,
ped_id
): https://github.com/traja-team/traja/blob/master/traja/datasets/dataset.py anddata/loader.py
.id
.Returns a
TrajaDataFrame
(a pandasDataFrame
converted viatrj = traja.TrajaDataFrame(df)
(see https://traja.readthedocs.io/en/latest/reading.html for more on this).A similar API to GeoPandas would be nice (https://stackoverflow.com/a/51625390/6256888), eg,
traja.datasets.available
. Look here for more inspiration: https://github.com/geopandas/geopandas/tree/master/geopandas/datasets.