tidy-finance / website

This repository hosts the source code for the website tidy-finance.org
https://tidy-finance.org
Other
86 stars 49 forks source link

Download Compustat data with currency 'curcd' in SQL to distinguish (or exclude) CAD and USD #131

Closed DayleLee closed 2 months ago

DayleLee commented 2 months ago

The North America Compustat database has reporting currency ('curcd') with CAD and USD. The issue may come when calculate the Book-to-Market ratio (BM), since the denominator (ME) from CRSP is all in USD, then the numerator (BE) from Compustat should also be (or translate into) USD to make the BM ratio meaningful. Fortunately, the number of valid stocks (after merge 'permno' and 'gvkey') which reported in CAD is not much. One easy solution may exclude them when downloading. The other can be currency translation. Compustat provides instruction in its manual 'Standard & Poor’s CompustatP®P Xpressfeed Understanding the Data', Chapter 8 'Currency Data' about it.

christophscheuch commented 2 months ago

@patrick-weiss should we just exclude companies with curcd != "USD"?

patrick-weiss commented 2 months ago

In principle, this issue should not arise as the stock market data should not include such companies, i.e., no CAD accounting data should be matched to USD stock data. In fact, this works nearly perfectly as Compustat has around 10% of its data entries reported in CAD. Yet the matched data has some 15 companies that report in CAD. It's frustrating, but it's a very good catch by @DayleLee.

@christophscheuch's solution seems reasonable but comes with one downside: People might just copy and paste it to download other data, where companies reporting in CAD would be perfectly fine. For the book, I think it makes sense to exclude them right in the beginning (with some mention in the text). For the R package tidyfinance, we might consider it as an option. What do you think?

christophscheuch commented 2 months ago

I agree that it is a good solution to exclude them explicitly in the book and provide an option for the R package.