uw-ssec / MAWpy

Mobility Analysis Workflow in Python
https://mawpy.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1 stars 2 forks source link

Define standard necessary column names for input data. #24

Open anujsinha3 opened 2 weeks ago

anujsinha3 commented 2 weeks ago

Currently, each column in the CSV file is accessed by an integer index. This has the following limitations:

  1. The workflow is prone to failures due to the wrong ordering of columns as the input CSV file columns are STRICTLY positioned with no flexibility.
  2. The source code becomes convoluted and difficult to comprehend.
  3. Code vectorization is difficult, and increased usage for nested loops impacts performance.

We plan to use pandas data frames going forward, for which we need to standardize the column names that will be part of the input CSV file.

Existing column names: (Confirm if these column names are standard ones, or if any change if required) "unix_start_t", "user_ID", "orig_lat", "orig_long", "orig_unc", "stay_lat", "stay_long", "stay_unc", "stay_dur", "stay_ind", "human_start_t"

gracejia513 commented 1 week ago

Hi Anuj, I believe these columns are not part of the input file: "stay_lat", "stay_long", "stay_unc", "stay_dur", "stay_ind", "human_start_t"

However, we can use them as standardized column names for the output file.

Anurag19101996 commented 1 week ago

Hi @anujsinha3, these are the following column names:

  1. User_ID
  2. Orig_lat
  3. Orig_long
  4. Datetime
  5. Orig_Unc (Not mandatory)

For output columns, please take a look at the output column names below: https://uwnetid.sharepoint.com/sites/og_ssec_escience/_layouts/15/Doc.aspx?sourcedoc={6b0ea251-f0a8-4ce3-8ea9-d1d796dcf28f}&action=edit&wd=target%28Meeting%20Notes.one%7Cb0d0aa65-2fb3-4a83-9b78-ca0f174e22b5%2F2024.06.20%20UW%20internal%20meeting%7C4f2f25d4-7499-4f2c-aa8a-252b79a0cfdf%2F%29&wdorigin=NavigationUrl

image

image

Anurag19101996 commented 1 week ago

@gracejia513 Please confirm once.