Re-write of importer + query code, Feedback?

untoldone / bloomapi

Create APIs out of public datasources

https://www.bloomapi.com/documentation/public-data

MIT License

89 stars 29 forks source link

Re-write of importer + query code, Feedback? #49

Closed untoldone closed 10 years ago

untoldone commented 10 years ago

BloomAPI's code is kind of a mess today. It works, but will be very difficult to improve upon in the future given some pretty messy code despite its small code base.

Some people want to use BloomAPI only for it's db syncing without the API, but the db looks exactly like the NPI files (which is 300+ columns wide and not super dev friendly).

In addition, some people are starting to depend on it. Next step features such as Geo-coding, taxonomy inclusion, including hospital/ physician compare datasets will all be painful to do with the current code.

I would like to rewrite the import + query code in the following ways:

Instead of one table for the full NPI, have one table per 'object' (npi, addresses, taxonomy, associations, etc)
Have the query code create sql with joins rather than having 'projection' code
Remove the pdf-to-schema code entirely as that's just unneeded

Please provide feedback if you have any!

etagwerker commented 10 years ago

@untoldone +1 on this.

I've started reading the code and I've found it quite complicated.

I didn't know that it was a 300+ columns table. I bet that makes it complicated to write an API on top of that dataset.

I think it's a good idea to separate the huge table into separate tables.

For example: We could turn "Provider Primary Taxonomy Switch_" into a provider_primary_taxonomy_switches table, or "Other Provider Identifier_15","Other Provider Identifier Type Code_15","Other Provider Identifier State_15","Other Provider Identifier Issuer_15" into a other_provider_identifiers table with columns id, type_code, state, issuer.

I believe this normalization will make it easier to add new features and correct bugs.

Just a fair warning, I'm quite a n00b with NPI/NPPES data.

untoldone commented 10 years ago

Agreed. I'm going to take a look at potentially getting started on this this weekend and getting a feel for how much work is involved.

kevinseelbach commented 10 years ago

@untoldone Have you made any progress on this? Would love to see a fork and collaborate on normalization. I'm not using Bloomapi specifically, but have been observing the project with interest as I'm working on a Python application that uses the NPI dataset.

untoldone commented 10 years ago

@kevinseelbach Yes -- I just got started on this rewrite last week or so. The changes are likely to go beyond just normalization of the NPI and turn into a relatively simple to use ETL tool + API + Search tool for any dataset that are still relatively small. The project is being optimized for time it takes to add for someone to describe the dataset.

Checkout the vnext branch (a work in progress) as I'm currently working there. Would love to chat more in depth if you're interested.

untoldone commented 10 years ago

This rewrite has now been completed and is present in v0.2.0. Please let me know if you have any feedback or questions @kevinseelbach @etagwerker.