msperlin / GetDFPData2

Repository for the development of R package GetDFPData2
36 stars 8 forks source link

get_info_companies() not returning companies tickers #4

Closed Shortiiiiie closed 1 year ago

Shortiiiiie commented 3 years ago

@msperlin Hi Professor, Thank you so much for this package, its amazing.

Yet, I'm not being able to retrieve a list of tickers of Brazilian companies from the DF 'get_info_companies()'. The list downloaded from cvm does not have a column covering the tickers itself.

Can you please help me?

msperlin commented 3 years ago

Hi there,

Unfortunately, as far as I know, ticker information is not available in any public sources besides b3 website. And I can't scrape their site due to their policies. If you've found a new public source, please let me know.

kafran commented 1 year ago

For future reference:

For sure this is an information that will need some curation.

msperlin commented 1 year ago

Thanks @kafran. I'll have a look and check if I can incorporate this information in the code.

kafran commented 1 year ago

Thanks @kafran. I'll have a look and check if I can incorporate this information in the code.

If you have any other idea post it here as I am trying to develop an ETL workflow myself =)

msperlin commented 1 year ago

This seems to be the best way forward:

https://dados.cvm.gov.br/dataset/cia_aberta-doc-fca

I worked with these CVM files in the past and it is of great quality. I already know how to parse it. This week or next I'll try to implement it.

kafran commented 1 year ago

This seems to be the best way forward:

https://dados.cvm.gov.br/dataset/cia_aberta-doc-fca

I worked with these CVM files in the past and it is of great quality. I already know how to parse it. This week or next I'll try to implement it.

I'll be using this datasets then. Do you know if they are cumulative? Can I rely solely on the most recent one from 2023? It appears that the key for this corpus is the Doc_ID, but I'm concerned about whether a company that hasn't submitted any changes in 2023 will still appear in the most recent corpus. This is vital for us to establish relationships for all companies with the CD_CVM.

msperlin commented 1 year ago

Yes, given the volume of data, I suspect it is cumulative. You'll probably be fine by using the most recent one.

msperlin commented 1 year ago

New function GetDFPData2::get_tickers() is now available. Let me know if it is working well (passed all my tests).