r-three / common-pile

Repo to hold code and track issues for the collection of permissively licensed data
MIT License
22 stars 6 forks source link

SEC Data #31

Open nkandpa2 opened 10 months ago

nkandpa2 commented 10 months ago

@StellaAthena, do you know who has been working on this source or if it's currently unassigned?

StellaAthena commented 10 months ago

It's currently unassigned. I view this as a relatively low priority insofar as I don't think it's likely to be in our top 2 trillion tokens.

conceptofmind commented 10 months ago

Ibm just released a dataset with sec filings. I also have some data for it. There is a repository for collecting 10Q as well: https://github.com/him1411/edgar10q-dataset

sunnydigital commented 9 months ago

Would love to look into this

conceptofmind commented 9 months ago

Would love to look into this

We can additionally look into getting all of the 10ks as well.

conceptofmind commented 9 months ago

Would love to look into this

https://www.sec.gov/os/accessing-edgar-data

sunnydigital commented 9 months ago

Hi all, please fill out the below when2meet to get some understanding of the data, thanks.

https://www.when2meet.com/?22763662-Q6EqE

conceptofmind commented 9 months ago

There is also this: https://github.com/janlukasschroeder/sec-api

conceptofmind commented 9 months ago

https://github.com/jadchaar/sec-edgar-downloader

StellaAthena commented 8 months ago

I'm going to try to get permission to bulk download from the SEC to make this go faster. That said, the data processing code can be written based on a relatively small amount of data so I would go forward with building all the code now.

conceptofmind commented 8 months ago

Sounds good

craffel commented 4 months ago

@conceptofmind am I remembering correctly that you said that a scrape of this data is ongoing and will take a long time to finish?

conceptofmind commented 4 months ago

@conceptofmind am I remembering correctly that you said that a scrape of this data is ongoing and will take a long time to finish?

I will have to try to get a rough estimate of the total time but it will take a little while due to edgar restrictions.