Epic: Upload a CSV - Githubissues

russorat commented 3 years ago

Problem & Opportunity

Users are having trouble loading data that's in a .csv file format through the UI. This is because the UI drag-and-drop section for loading .csv data requires the flux annotated .csv format. It's not obvious to new users what those annotations have to be, despite the documentation available to them. This is causing users to get stuck at the very beginning of their journey with what should be a very simple .csv file upload and data exploration. The most common reason that .csv files fail to upload is the missing annotations. Our opportunity is to design out this common user failure by offering a clean, user-friendly, step-by-step .csv annotation applier to users' raw .csv files that are missing annotations. Simultaneously, we will maintain the ability to upload properly annotated .csv files. This capability will help users with any ingestion preference because inspecting the shape of your data via .csv files is an efficient way to quickly make sure your ingestion method, preferred measurements, fields, and tags meet the needs of your data analytics or application development business needs.

Core Team

Product Management: Amy Luckette
Engineering: Kristina Robinson and Bucky S.
Design: Tara Kirkland

Figma Design

https://www.figma.com/file/U2zSUPmTAkOofGEpQkNe5T/Upload-a-CSV?node-id=211%3A1640

Design Principles

Reduce the likelihood of .csv upload failures
Offer .csv annotation/parser tooling in the UI
Reduce the need for users to spend hours heads-down in documentation
Improve in-app education about required annotations and their implications

Basic Expectations

The user should be able to upload a non-annotated .csv file through the UI and easily be guided through adding required annotations.
If a user has an annotated CSV, they should still be able to upload it via drag and drop.
The user should be presented with an error message in the event that the upload failed. Update the error message accordingly.

Test .csv Files

Here are some sample annotated CSVs:

User Stories and ACs

User Story: Manage by Feature Flag

    ACs
    - Manage by a feature flag

User Story: Upload your File

     ACs
    - The first section of the page should be called "Upload your File." 
    - Note: The Figma may seem like this section is on its own page - it is NOT. The entire new "Upload a CSV" experience should be on ONE PAGE with a scroll bar available for the whole page. 
    - Underneath the "Upload your File" parent section title, the user should see the following text: "You can upload a standard CSV or a Flux annotated CSV. Browser upload limits apply (typically 4GB)."
    - The first sub-section within the "Upload your File" parent section should be "Example CSVs" organized by type via tab options (see Figma)
    - The first tab option on the left should be "Standard CSV" while the option on the right should be "Annotated CSV"
    - If the user clicks on the "Annotated CSV" tab option, the following text should display below the tabs: "For more detailed information, check out the [Annotated CSV Documentation](https://docs.influxdata.com/influxdb/cloud/reference/syntax/annotated-csv/#annotated-csv-in-flux)"
    - If the user clicks on the "Standard CSV" tab option, the following text should display below the tabs: "Your CSV file should contain one header row with column names, and row values separated by commas."  
    - Directly below that text, display examples of either annotated or standard .csv file formats so that visual learners can quickly understand the differences and what's expected. The example that displays will depend on the tab option that the user chooses. (see Figma)
    - The second sub-section  of the "Upload your File" parent section should be "Select bucket"
    - Display the same "Select bucket" component that's used in the firstMile/Onboarding wizards. Use the select bucket component as is, with no changes. 
    - The third and final sub-section of the "Upload your File" parent section should be "Add your file"
    - Display a modified version of the drag-and-drop/click to upload .csv component. Summary of changes here:
         - Modify the drag and drop zone so that the component does not send the data automatically to InfluxDB.
         - Remove the buttons from the drag-and-drop component (See uploaded data & upload more data)
         - Drag and drop component should just serve the purpose of allowing the user to preview their file NOT write it to the DB. Writing the data will be a separate step triggered by the "Finish Upload" button at the bottom of the screen. 
         - Drag and drop component will also no longer be the place to display write errors. 
         - Ability to remove the file that they dragged and dropped (option to clear the drop zone via an "x" next to the file name in the drop zone)
    - Error Handling - Display errors in the standard upper right-hand corner of the screen. 
         - Ensure the file meets basic requirements such as file size and # of rows. 
         - Throw an error if the user is missing data. For example, make sure that every row has the same number of commas. If low effort, tell the user which line in which they are missing data.
         - Must be a comma-delimited file        

    Implementation
    - "I don't think we have a file size limit in code, but browsers limit how big of a file you can upload. Firefox is 2GB, chrome is 4GB...maybe they're all 4GB these days?" - Bucky
    - Looks like its configurable by environment via the feature flag:
        # 27 * 1024 * 1024 = 28311552 --> 27 MiB is the current limit for UI CSV parsing
        name: UI CSV byte limit Schema Browser
        description: Controls the CSV byte limit for CSVs streamed from the API for Schema Browser
        contact: QX Team, Ariel
        key: increaseCsvLimit
        default: 28311552
        lifetime: Permanent
        expose: true

User Story: Configure your File

    ACs
    - The next parent section should be "Configure your File"
    - Auto-scroll to this section after the user adds their file to the dropzone
    - Underneath the section title should be the following text: "Map your data to be compatible with InfluxDB ingestion."
    - Underneath that text, display an informational text box containing terms and definitions for Fields, Measurements, and Tags. (See Figma)
         - Field (required)Key-value pair for storing time-series data. For example, insect name and its count. You can have one field per record (row of data), and many fields per bucket.key data type: stringvalue data type: float, integer, string, or boolean
         - Measurement (required)A category for your fields. In our example, it is census. You can have one measurement per record (row of data), and many measurements per bucket.data type: string
         - Tag (optional)Key-value pair for field metadata. For example, census location. You can have many tags per record (row of data) and per bucket.key data type: stringvalue data type: float, integer, string, or boolean        - Below the terms and definitions, display the .csv file name and display a Preview of the user's data.       
    - The PREVIEW should behave dynamically according to the column mappings that the user assigned either via their annotations or via the annotation builder.
    - As the user selects their _measurement and _time mappings, dynamically update the _tag and _field list.        
    - ![Screen Shot 2022-11-07 at 10 43 13 AM](https://user-images.githubusercontent.com/112978673/200352439-0b3fa701-4c6e-4673-b95d-008f1a9d291e.png)        
    - _measurement and _time are required mappings (must be chosen/assigned prior to being able to click the "Finish Upload" button)
   - Below the PREVIEW, display the "Measurement Column" dropdown mapper. Display a red asterisk next to the "Measurement Column" so that the user knows it's required. As the user selects their measurement mapping, dynamically update the PREVIEW with their choice. Also, dynamically update the Tag and Field Column list. (See Figma)
   - Below the "Measurement Column" dropdown mapper, display the "Timestamp Column" dropdown mapper and the "Timestamp column format" dropdown. Display a red asterisk on both of these as well so that the user knows they're required. 
   - Underneath the "Timestamp Column" text yet above the timestamp column dropdown, display the following text: "Select your timestamp column"
   - Underneath the "Timestamp column format" text yet above the timestamp column format dropdown, display the following text: "Select the format your timestamps are currently in"
   - The two timestamp column format options displayed in the dropdown should be "UNIX UTC" and "ISO 8601"
         - Informational tooltip: "ISO 8601 represents date and time by starting with the year, followed by the month, the day, the hour, the minutes, seconds, and milliseconds. For example, 2020-07-10 15:00:00.000 represents the 10th of July 2020 at 3 p.m."
    - As the user selects their timestamp column mapping, dynamically update the PREVIEW with their choice. Also, dynamically update the Tag and Field Column list. (See Figma)    
    - Below the Timestamp column dropdown section, display the "Tag and Field Columns" (no red asterisk needed)
    - Any remaining unmapped column names should display in the "Tag and Field Columns" list with radio buttons to the right of each column name. 
    - The radio buttons for all remaining columns should default to the "field" option yet allow the user the change it to the field if they desire.  
    - One of the radio buttons MUST be chosen, do not allow the user to deselect both. It must be one or the other (tag or field). Reminder - default the radio button selections to "field." 
    - If the user changes their mind about the _measurement selection dropdown option, dynamically update the tag and field column list.  
   - If the user has a properly annotated .csv file - with a _measurement and _time column specified, then detect that these columns are already assigned and display the user's assigned column mappings in the preview box and show their assignments in the dropdowns/radio button sections. Even with annotated .csv files, allow the user to change their column assignments/mappings of tags and fields. 
    - If unannotated, parse just the first row as the file so that the user can assign the required annotations to the columns
          - Define 1 and only 1 column as a measurement
          - Define 1 and only 1 column as Time. The time column format must either be UNIX UTC or ISO 8601.
    - For all .csv file types:
         - Define the remaining columns as either a field or a tag (one or the other) via a radio button.
         - Display the remaining column names in a list below the time dropdown. Display the field/tag assignments to the right. See Figma.  
         - These header assignments must be required as indicated by the disabled 'Finish Upload' button. Consult design on whether or not to add a red asterisk next to required sections.   
         - Default the non-measurement and non-time columns as 'fields.' (auto-select field as the radio button choice) 
         - Display an error if the user doesn't have at least one field column mapped and disable the "Finish Upload" button. "You must have at least one column mapped as a field"             
         - Header column assignments can only be assigned once and only once to one of these assignments: measurement, time, field, tag. Add data validation to prevent duplicate column assignments. 
    - Provide users the option to download their annotated .csv once they've assigned/mapped all of their columns. Consult design (Tara Kirkland) as to where the "download .csv" button / option should be. 
    - Display a 'Cancel' and a 'Finish Upload' Button (See Figma)
    - Disable the "Finish Upload" button until all required content is finished. (_measurement and _time mappings)
    - Error Handling and user feedback: successful upload and failed upload + why fail.
    - The success message should match what's in the Figma. "filename.csv uploaded successfully. Query your data!" Hyperlink the "Query your Data" to the new script editor in the data explorer. 
    - Existing Error Messages for Failed CSV Uploads:
          - failed to read metadata: missing expected annotation datatype.
          - failed to read metadata: failed to read annotations: wrong number of field 
          - failed to read metadata: column “” has invalid datatype: unsupported data type “”
          - failed to read metadata: column “_value” has invalid datatype: unsupported data type “float” 
          - failed to initialize execute state: could not find bucket “<BUCKET>” — 3
          - Failed to execute Flux query — 3
          - compilation failed: error @2:11-2:20222: expected comma in the property list, got ILLEGAL — 1
          - runtime error @3:14-3:32: to: no column with label _measurement exists — 10
          - runtime error @3:14-3:32: to: table has no _field column — 14
    - Reminder to dynamically update the preview of the user's data to match the column assignments that they choose.   

   Implementation
    - Are there file size limits per account type? Or in general?
    - Is there a record limit (rate limit) per account type?

User Story: End-to-End Testing

    ACs
    - End to end testing

User Story: UI Eventing

    ACs
    - DO NOT use the word "new" when describing events as someday it will no longer be 'new.' Consult with @kristinarobinson on event names and event properties
    - Events TBD

User Story: Remove Feature Flag

    ACs
    - Remove the feature flag after receiving approval from both product management and engineering leadership.

Dev Notes (Implementation Details)

Existing code to potentially leverage:
- The Influx CLI tool has a CSV parser.
- The Flux engine also has a way to parse CSV data.
- Telegraf has a way to parse CSV data into Line Protocol. There could be some learnings there as well: https://github.com/influxdata/telegraf/tree/master/plugins/parsers/csv

Luckette commented 2 years ago

@taramk Can you work w/ me to come up with new text for the bottom of the new .csv upload page? I could also use your help coming up with a good error message in the event that the csv upload fails.

taramk commented 2 years ago

Figma: https://www.figma.com/file/U2zSUPmTAkOofGEpQkNe5T/Upload-a-CSV?node-id=6%3A28

Text replacements are shown in red. The other change we've made is to move the intro paragraph above bucket selection.

@Luckette

ALuckette commented 2 years ago

@hoorayimhelping and Amy to turn these into smaller tickets.

satorstefan commented 1 year ago

The CSV upload is super annoying! Even when trying examples from https://docs.influxdata.com/influxdb/cloud/reference/syntax/annotated-csv/#data-types it fails.

Case1 Failed to upload the selected CSV: runtime error @3:14-3:32: to: no column with label _measurement exists Of course it is missing, because by the documentation measurement should work.

Case2 Adding the column and the error still is Failed to upload the selected CSV: runtime error @3:14-3:32: to: no column with label _measurement exists

Case3 Example from homepage works

Case4 Setting one column as ignore the error is: Failed to upload the selected CSV: error in csv.from(): failed to read metadata: column "region" has invalid datatype: unsupported data type "ignore"

I did other examples and it seems to me that any change to the example breaks the import. Maybe some logic is expecting different name for the datatypes one "_" is used anywhere in the [file.]

Currently the error message is: https://docs.inf case1.csv case2.csv case3.csv case4.csv

satorstefan commented 1 year ago

removing the "result" column results in this error? What is that column used for? Failed to upload the selected CSV: runtime error @3:14-3:42: to: no time column detected

why?

satorstefan commented 1 year ago

Removing the table column results in the error: Failed to upload the selected CSV: runtime error @3:14-3:42: to: no time column detected

satorstefan commented 1 year ago

So any change to the structure of this file results in a strange error... air-sensor-data-annotated.csv

michaeldufault commented 3 months ago

Just dropping in to say that in reaching the end of my rope trying to figure out why my annotated .csv file would not upload - I ended up here. Documentation is very lacking on Influx. I'm now browsing various github repos to see if there are any handy tools for this. Cheers!

hot22shot commented 2 months ago

Just dropping in to say that in reaching the end of my rope trying to figure out why my annotated .csv file would not upload - I ended up here. Documentation is very lacking on Influx. I'm now browsing various github repos to see if there are any handy tools for this. Cheers!

Did you find something ? This is very annoying.

samuelzamvil commented 1 month ago

This about sums up my experience.

Users are having trouble loading data that's in a .csv file format through the UI. This is because the UI drag-and-drop section for loading .csv data requires the flux annotated .csv format. It's not obvious to new users what those annotations have to be, despite the documentation available to them. This is causing users to get stuck at the very beginning of their journey with what should be a very simple .csv file upload and data exploration. The most common reason that .csv files fail to upload is the missing annotations.

I've heard so many good things about the product but it's hard to justify the effort to learn a new product when I can't get past step one.

I spent about an hour trying to get past this. Going into the annotations documentation and the csv.from() function doc doesn't help. Going the LLM route, GPT 4o and Claud 3.5 Sonnet couldn't figure out the annotations for me.

I can't tell you how many times I've seen the following error and it doesn't seem to represent the actual error. Adding any number of fields to the annotation header doesn't give me a separate error.

Failed to upload the selected CSV: error in csv.from(): failed to read metadata: failed to read annotations: wrong number of fields

influxdata / ui

Epic: Upload a CSV #1544

Problem & Opportunity

Core Team

Figma Design

Design Principles

Basic Expectations

Test .csv Files

User Stories and ACs

Dev Notes (Implementation Details)