granturing / spark-power-bi

Power BI API adapter for Apache Spark (deprecated)
Apache License 2.0
26 stars 11 forks source link

New limitations of Power Bi #1

Open KaganRoman opened 8 years ago

KaganRoman commented 8 years ago

Great work of the library, many thanks!

Just to mention new power bi limitations, which caused to failures using spark-power-bi. 1,000,000 rows added per hour per dataset 120 POST rows requests per minute per dataset 5 max pending POST rows requests per dataset

I've tried to send big data set using the library, and got errors after 200K: "java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]"

So when I put delay between calls, I succeeded to send 1M rows, and have got this exception: "Maximum number of rows exceeded in time window. Please retry later". It forces to implement logic which takes into account hourly increment

granturing commented 8 years ago

Yes, unfortunately there's not much way around the limitations of the Power BI API. If you're dealing with high frequency or high volume data streams the best bet would be to push to Event Hubs or just use another dedicated storage system (Azure Blobs, HDFS, SQL, DocumentDB, etc.).

The 5 concurrent requests is currently handled by coalescing the RDD per batch down to 5 partitions, but unfortunately that can lead to performance degradation since it reduces the parallelization. It also does batching internally to send only 10,000 rows at a time per POST request which is the other limitation per the docs here https://msdn.microsoft.com/en-us/library/dn950053.aspx.