[JOSS] ValueError in Example 4 notebook

ni1o1 / transbigdata

A Python package develop for transportation spatio-temporal big data processing, analysis and visualization.

https://transbigdata.readthedocs.io/en/latest/

BSD 3-Clause "New" or "Revised" License

459 stars 115 forks source link

[JOSS] ValueError in Example 4 notebook #12

Closed anitagraser closed 2 years ago

anitagraser commented 2 years ago

In Example 4-pNEUMA trajectory dataset processing,ipynb, I get the following error:

xref: openjournals/joss-reviews#4021

ni1o1 commented 2 years ago

I ran Example 4-pNEUMA trajectory dataset processing.ipynb again but I didn't encounter the above problem. This error happens when the input GeoDataFrame tmpgdf_data is empty, perhaps you lose the content of tmpgdf_data in previous steps.

anitagraser commented 2 years ago

It looks like there is no track_id 2138 in the provided dataset. The maximum in gdf_data is 157:

yuanjian24 commented 2 years ago

Hi there, according to your picture, gdf_data only has 188 rows. However, the right data should contain 10674 rows of data. Please make sure that you got the right dataset by reading this file:

data = pd.read_csv('data/pNEUMA_tbd_sample.csv')

and run every cell before the mentioned cell.

Or, you can also directly replace the tmp_gdf_data with gdf_data:

# get the nearest node of each point on the map
gdf_data = tbd.ckdnearest_point(gdf_data, nodes)

# extract the o/d node
o_index, d_index = gdf_data.iloc[0]['index'], gdf_data.iloc[-1]['index']
o_node_id, d_node_id = list(nodes[nodes['index']==o_index].index)[0], \
                       list(nodes[nodes['index']==d_index].index)[0]
print(o_node_id, d_node_id)
gdf_data.head()

anitagraser commented 2 years ago

I just pulled the lastest version from the repo and the issue remains.

gdf_data is created from data_sparsify_clean which only has 128 rows:

This has to be fixed in the notebook.

yuanjian24 commented 2 years ago

This is a little bit weird, I have double-checked the code. To solve the problem, I have made the data_sparsify dataframe to a CSV file: data/data_sparsify.csv. Please pull again and run all the cells again to see if it works.

Use data_sparsify to test tbd.ckdnearest_point if it still doesn't work

anitagraser commented 2 years ago

Rerunning the notebook over-writes your provided csv. So there should probably be a check if the file already exists.

With your file, the following cells run.

One strange effect is that I get the following plot:

when the checked in version shows:

yuanjian24 commented 2 years ago

Hi, I have updated the e.g. 4 notebook:

The file data/data_sparsify.csv will not be overwritten now, and I added a file check and a check-points. You can start from here:

Regarding the second problem, a GeoDataFrame format is required for visualization (with the column of 'geometry') The code in the red box can print the list of vehicles in the sequence of points number, you can choose any number or follow my codes:

jGaboardi commented 2 years ago

@anitagraser hmmm. I did not have any issues with the original Example 4-pNEUMA trajectory dataset processing.ipynb and could reproduce the notebook.

anitagraser commented 2 years ago

Might be a local issue then. I'm testing on Win10 with Conda.

In my projects, I find it helpful to set up the notebooks so they can be tested on MyBinder. It gives some extra assurance that the notebooks also work on other machines than my own.

anitagraser commented 2 years ago

Fixed in current version