neokd / DataStorehouse

DataStoreHouse is an open-source project that aims to create a collaborative platform for gathering and sharing a wide variety of datasets. It provides a centralised repository where individuals and organisations can contribute, discover, and collaborate on diverse datasets for various domains.
https://datash.vercel.app
MIT License
18 stars 22 forks source link

Implemented find_outliers.py #37

Closed rtiop closed 1 year ago

rtiop commented 1 year ago

Description

This pull request proposes to add a new file: find_outliers.py. From the terminal the user can find all of the "cells" in a JSON or CSV that are outliers in their category (column). That is to say: Z-score > 3. The information is then printed to the terminal.

Related Issues

This script addresses the first feature mentioned in the issue "Create a python script to determine the performance and accuracy of the datasets" #31 No other feature is implemented.

Changes Made

-Create a Python script in the Validation folder

Screenshots (if applicable)

Checklist

Please review and check the following before submitting your pull request:

Additional Notes

  1. Support for other file types can later be added without too many difficulties. Currently only CSV and JSON are supported.
  2. The script could easily be modified to be used as a function in another script instead of printing the output to the terminal.
vercel[bot] commented 1 year ago

Someone is attempting to deploy a commit to a Personal Account owned by @neokd on Vercel.

@neokd first needs to authorize it.

neokd commented 1 year ago

@rtiop use

if __name__ == '__main__':
   main()
rtiop commented 1 year ago

Done. I've seen it in many scripts, but I never understood what it was for. What's its utility?

neokd commented 1 year ago

Done. I've seen it in many scripts, but I never understood what it was for. What's its utility?

It is mainly used so we can make a module, You can refer here to get clarified.