microsoft / prose

Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK.
https://microsoft.github.io/prose/
Other
618 stars 100 forks source link

It takes about 2 hours to detect the Excel file which contains some pictures. #78

Open ivanliu-microsoft opened 2 months ago

ivanliu-microsoft commented 2 months ago

Our CSV parser leverages PROSE to check if the string content is CSV format or not. But our customer reported that it took so much time (2+ hours) to parse before it returned false (Not a qualified CSV file). I find it is blocked by the line of codes below: (See the full codes)

image

Here is the sample excel file attached. Quotation-Personal care wipes.xlsx My question is:

ashishxtiwari commented 2 months ago

Can you clarify what exactly is being used to set "strData"? In other words, how is "strData" generated from the shared excel file? (I can't access the Babylon repo to find out.)

ivanliu-microsoft commented 2 months ago

Yes, the strData is the content of the shared excel file.

From: Ashish Tiwari @.> Sent: Tuesday, August 20, 2024 1:46 AM To: microsoft/prose @.> Cc: Author @.***> Subject: Re: [microsoft/prose] It takes about 2 hours to detect the Excel file which contains some pictures. (Issue #78)

Can you clarify what exactly is being used to set "strData"? In other words, how is "strData" generated from the shared excel file? (I can't access the Babylon repo to find out.)

— Reply to this email directly, view it on GitHubhttps://github.com/microsoft/prose/issues/78#issuecomment-2297105243 or unsubscribehttps://github.com/notifications/unsubscribe-auth/A5X66FH6UVMWNWZ2QQB36HTZSIVMPBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVA2DINRUGQ4DSOECUR2HS4DFUVUXG43VMWSXMYLMOVS2UMRUG4ZDMNJUHA3TNJ3UOJUWOZ3FOKTGG4TFMF2GK. You are receiving this email because you authored the thread.

Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ivanliu-microsoft commented 2 months ago

Our csv parser will not feed any data format to PROSE. We just call its Learn() API to detect if the file is CSV or not (return CsvProgram or not). See the screen shot attached above.

ivanliu-microsoft commented 2 months ago

Can you clarify what exactly is being used to set "strData"? In other words, how is "strData" generated from the shared excel file? (I can't access the Babylon repo to find out.)

Hi Ashish, any updates from your side? need some workaround solution on it. Many thanks.