rabanti-github / NanoXLSX

NanoXLSX is a small .NET library written in C#, to create and read Microsoft Excel files in the XLSX format (Microsoft Excel 2007 or newer) in an easy and native way
MIT License
122 stars 36 forks source link

Version 2_4_0 workbook.load #63

Open alberti47 opened 3 months ago

alberti47 commented 3 months ago

Bug description function workbook.load(filename.text) in version 2-4-0 goes to infinite loop previous version 2_3_3 load same file properly

Environment (please complete the following information):

rabanti-github commented 3 months ago

Hi, Thanks for the report. Can you give me some further details: What is in filename.text? Is this a renamed XLSX file, an old XLS file (also renamed) or a plain text? What happens during the infinit loop? Do you get a stack overflow, is the process just not ending, or are there other events that happens (e.g. a crash, memory consumption is going up, 100% CPU usage)?

Thank you in advance for clarification.

alberti47 commented 3 months ago

Thank for the answer. Really it seems that will be a challenge for You ! this is the piece of code in my large program that recall the load function : // // Verifico se posso aprire il file xlsx // workbook=new Workbook(); try { workbook = Workbook.Load(SelectedFileXLSX.Text); } catch (Exception ex) { MessageBox.Show("errore apertura file xls - Aperto in Excel?"); return; } My application was tested since the first release of NanoXlsx and this piece of code works with the release 2_3_3

What is strange is that if use the demo program , just patching the Basic demo function 👍 private static void BasicDemo() { Workbook workbook; workbook = Workbook.Load("E:\DownLoad\F_2024_07_18.xlsx"); it works.

When in my program the load stops it remain just in some timing loops , no disc activity, no memory activity.

About the file I'm sending the file xlsx. this is a file that time to time is updated , new lines eventually inserted, other updates and is the same since many months from now.

In this file the only field that is particular is the field DATA FINE that when 'blank' l for the application mean 'undefined date'.

I dont have unfortunally any other info. Best regards --> Attached file removed

rabanti-github commented 3 months ago

Hi, Thank you for the update. I testes the reader with the attached XLSX file and your code. The file could be read here without any issues. I even renamed the file to the extension .Text (no problem to read too). It should also be no general problem, otherwise several dozens to hundred of unit tests would possibly fail.

But here are some things you could test:

rabanti-github commented 3 months ago

Two other additions:

alberti47 commented 3 months ago

I tried all Yr suggestion , and I debug my application using version 2_4_0. The problem is in the function private async Task ReadInternal() in module XlsxReader.cs that was implemented starting from 2_4_0 in lines await fs.CopyToAsync(memoryStream); await inputStream.CopyToAsync(memoryStream); since the CopyToAsyn stay in loop without exiting.

Just changing the above 2 lines avoiding async copying , ie fs.CopyTo(memoryStream); inputStream.CopyTo(memoryStream);

the application now run properly. The CopyToAsync although nice programming pratice I'snt (for my opinion) unnecessary as included in another async Task ReadInternal and does not produce noticeable speed effects.

Waiting for comments Regards

rabanti-github commented 3 months ago

There is probably a deadlock issue with the stream you are using. It seems that Task.GetAwaiter().GetResult() can lead to deadlocks under certain circumstances. Using this approach is BTW not a matter of speed but to have a unified platform. This attempt works in the majority of use cases. However, I have to investigate the deadlock cases of course. For that, it is important to know whether you are already in a asynchronous state or in a tread, when calling the Workbook.Read method. Can you somehow provide the code, how you read files? Otherwise, it is very hard to analyze the issue to find a solution that fixes the problem and do not affect the setup of other users. Thanks in advance.

alberti47 commented 3 months ago

I'm sending small application I was able to reproduce the problem c# Visual studio community 2022 Hope this will help regards Test_For_DeadLock.zip

rabanti-github commented 3 months ago

Thank you for the code. I think I have found a solution for you. Since you are with WinForms already in a threaded context, you can (an probably should) indeed used the async methods. This should also be beneficial to keep the UI responsive, if I'm not wrong (sorry, my last WinForms project is some years back). So, this code should help:

if (SelectedFileXLSX.Text.Trim() != "")
{
    try
    {
        workbook = await Task.Run(() => Workbook.LoadAsync(SelectedFileXLSX.Text));
    }
    catch (Exception ex)
    {
        MessageBox.Show("errore apertura file xls - Aperto in Excel?");
        return;
    }
    MessageBox.Show("DONE");
}

Can you try this out?

alberti47 commented 3 months ago

Thanks for Yr effort to explain the problem. Yr solution works on my example if inserted in the dependent Form, but does not give a complete answer to question I opened. In fact I would like now an explanation why using the Demo program the Load method works ! Thanks Best regards

rabanti-github commented 3 months ago

Because the demo program (assumed executed in this class) runs in a single thread. The demo program is started, it runs in the main thread and ends. The main Thread is blocking for a short time during the process.

A WinForms project (same for WPF) uses a ThreadPool and has normally at least two threads. One of them is the UI thread. This is also the reason why cross-form actions must be invoked. The used ThreadPool constellation seems to have an issue with GetAwaiter().GetResult(). I still have to figure out how to handle this. Even it is highly encouraged to use the async methods in a multi-thread environment, it should at least throw an exception or run into a timeout if used in a blocking manner. I try to fix this in a future release. Maybe I also add a note in the API-doc / readme to avoid using non-async methods in UI projects