tjscience / RoboSharp

RoboSharp is a .NET wrapper for the awesome Robocopy windows application.
MIT License
217 stars 67 forks source link

ExcludeFiles and ExcludeDirectories - possible improvement? #121

Closed PCAssistSoftware closed 2 years ago

PCAssistSoftware commented 2 years ago

For both of the above you can pass multiple values but if a file name or path contains spaces then they need to be encased in quote

e.g.

c:\files c:\pictures

or

c:\files "c:\my photos"

etc

If you pass a value which has spaces but don't enclose it in quotes then it doesn't work - as you would expect.

Can we do anything to parse the value you are passing to either of these to ensure it is quote as necessary?

Not a simple task and may not even be possible as would need to be able to work out what is a space between two separate entries and what is a space in the middle of a path e.g.

file1.txt file2.txt

would be obvious as space is between two separate file names, but

file.txt my file.txt

wouldn't be obvious as one space between file names and one space 'in' the second file name

Thoughts / ideas? or is this just going to have to rely on advising user how to format values rather than being able to enforce or help with it?

RFBomb commented 2 years ago

The only foolproof way to implement this within the library would be to introduce a List for exclude files / directories, then loop through the list when building the command. That way, each string in the list is processed individually, and is sensible for consumers to implement. Then just use a 'LoggingOptions.WrapPath' method to process them

PCAssistSoftware commented 2 years ago

Thanks for reply - yes I thought that may be the answer, at least like that you can ensure all items in list are quoted correctly if contain spaces etc

Probably something best handled in the calling application rather than directly in the library then, as I do for things such as file filters

RFBomb commented 2 years ago

Rather than updating the current methodology of just storing the list in a string format, the only thing I can think of at the moment is introduce a new method that accepts IEnumerable as the parameter, loops through as I described above, added each item to the desired list of inclusion/exclusions. So we would need a method for each property to be able to do that. THis way, it can accept a list, but will do the conversion automatically, ensuring we don't break existing applications or cause ourselves unneccessary headache by converting it to a list then being backwards compatible.

PCAssistSoftware commented 2 years ago

Hi @RFBomb

Just noticed that in latest version you have added in SelectionOptions.ExcludedFiles.Add and SelectionOptions.ExcludedDirectories.Add

Not sure at what stage of implementation you are with these but just been giving them a try - apologies if testing before ready!

Currently they have same issue I had when originally opening this issue e.g. the handling of a space in a file or path name

e.g.

Adding

job2.SelectionOptions.ExcludedFiles.Add("myfile.txt")
job2.SelectionOptions.ExcludedFiles.Add("my file.txt")

Doesn't work as created as splitting 'my file.txt' into 'my' and 'file.txt'

"C:\Users\darre\Desktop\1b C:\Users\darre\Desktop\2 ""*.*"" /COPY:DAT /DCOPY:DA /E /XF myfile.txt my file.txt /R:0 /W:30 /BYTES "

:
:: Exclude These Files :
::
    /XF     :: eXclude Files matching these names
        myfile.txt
        my
        file.txt
RFBomb commented 2 years ago

I can do testing today on this, I thought I used 'WrapPath' on it, but I guess not.

Solution will be to update the 'get' method of ExcludeFiles to something like this:

get{
String RetString="";
ForEach( string s in ExcludedFiles )
{
RetString += s.WrapPath() + " ";
}
Return RetString;
}
PCAssistSoftware commented 2 years ago

Okay thanks.

And I suppose we will need to potentially amend GUI to have Add button for both ExcludeFiles and ExcludeDirectories - unless we can also parse content entered in existing text boxes?

image

RFBomb commented 2 years ago

Take a look at the old property

PCAssistSoftware commented 2 years ago

Take a look at the old property

If you mean it is marked as obsolete then yes I realized that, hence saying it needs reflecting in GUI as current fields do nothing

RFBomb commented 2 years ago

It doesn't do nothing though.

The get functionality works as before, and with the 'get' change I recommend in upper comment in this thread, it will sanitize the list values into a robocopy compatible string.

If possible, the 'set' method should be the only one marked as obsolete now that I think about it.

Either way, the set method still sets up the list for ExcludedFiles/Dirs by splitting the input string, similar to how RoboCopy was evaluating the split. This is so that it doesn't break old code that was previously working, but instead just issues a compiler warning.

So get and set still work fine, but set is really the one that consumers may have to rework.

This is what I referred to as with my remark about testing this and the regex split that occurs for those properties

PCAssistSoftware commented 2 years ago

Okay, only mentioned it because at this moment in time it doens't do anything at all - what I enter in existing text boxes is not used, hence asking.

image

This is what I referred to as with my remark about testing this and the regex split that occurs for those properties

Yes I had made a note to do some testing on this today because you mentioned to do so - hence my replies

RFBomb commented 2 years ago

That's why I said the Regex loop needs testing here in the PR

PCAssistSoftware commented 2 years ago

That's why I said the Regex loop needs testing here in the PR

Yes I had made a note to do some testing on this today because you mentioned to do so

And that is exactly why I am testing and reporting back :)

RFBomb commented 2 years ago

Ok so breakpoint line 108 of SelectionOptions to see what's happening.

I'm getting pulled to do another job to far so I can't.

PCAssistSoftware commented 2 years ago

Ok so breakpoint line 108 of SelectionOptions to see what's happening.

I'm getting pulled to do another job to far so I can't.

No problem, now looking..

PCAssistSoftware commented 2 years ago

So the value I am entering in box (which I would suspect it to struggle with and was whole point of initially opening this issue e.g. spaces in a file name or path) is

myfile.txt my file.txt

So on line 106 'value' correctly contains "myfile.txt my file.txt"

image

By line 108 the collection contains 2 items - 'myfile.txt' and 'my' - so issue here as expected space has thrown it out, but for some reason it has also dropped the last part which you would think it might see as a 3 item e.g. file.txt

image

Lines 110/111 appears to successfully add those two items to the list

But when I look at contents of copy in line 276 of MainWindow.xaml.cs it doesn't show the exclusions in the CommandOptions and ExcludedFiles Count is 0

image

RFBomb commented 2 years ago

Ok, breakpoint line 160 in selection options, does the list show a count of 0?

PCAssistSoftware commented 2 years ago

myfile.txt my file.txt

No - excludedFiles shows a count of 2

image

RFBomb commented 2 years ago

Ok, so we know items are being added to the list.

We know the regex needs to be updated because it's skipping the last item in the list - this is likely due to searching for a space character at the end of the trim and it doesn't exist, the the capture fails.
But it correctly caught 'my' in its own.

(Previous functionality required that these input string be properly wrapped with quotation mark, and that's what the register is designed to find split according to, so that seems to be working as expected so far. If you wrap "my file" in quotations the regex should include that as one list item (hopefully)

The 'get' accessor stills needs to be updated to the code I provided in a previous comment.

PCAssistSoftware commented 2 years ago

Ok, so we know items are being added to the list.

agreed

We know the regex needs to be updated because it's skipping the last item in the list - this is likely due to searching for a space character at the end of the trim and it doesn't exist, the the capture fails. But it correctly caught 'my' in its own.

(Previous functionality required that these input string be properly wrapped with quotation mark, and that's what the register is designed to find split according to, so that seems to be working as expected so far. If you wrap "my file" in quotations the regex should include that as one list item (hopefully)

unfortunately not , tried entering with quotes around three different file names e.g.

image

collection still ignorning last one and getting confused by space

image

The 'get' accessor stills needs to be updated to the code I provided in a previous comment.

Okay will try amending that as per you previous comment

RFBomb commented 2 years ago

Ok, so the regex needs to be updated.

If you want, you can use https://regex101.com test out the regex and play around until it works against your test string, pulling out the multiple matches.

That's the site I use to assist with all the other regex when debugging them.

Keep in mind that someone might not necessarily have put the '.' Into the string, though I expect mist would. ( expected is *.txt, but SomeF* is still a valid filter)

PCAssistSoftware commented 2 years ago

I will certainly have a play and see, but find RegEx confusing at best of times - that site is handy though!

I would think most would put file extensions if needed as may want to just exclude myfile.txt and NOT myfile.doc etc

Can't get "get" assessor working - so will leave it until you have some time - no hurry whatsoever, was just working through my list of things to test :)

Agree it is better to always encase in quotes when entering them - can put note in GUI perhaps for that as well - solves issue of spaces in the middle - when RegEx fixed

RFBomb commented 2 years ago

Try this regex

"\\s*(?<VALUE>\".*\"|.+?)(?:\\s*)"

It might be closer. | acts as an 'or' so it's searching for any characters wrapped within a set of quotes OR any characters separated by spaces.

PCAssistSoftware commented 2 years ago

Try this regex

"\\s*(?<VALUE>\".*\"|.+?)(?:\\s*)"

It might be closer. | acts as an 'or' so it's searching for any characters wrapped within a set of quotes OR any characters separated by spaces.

Unfortunately not getting any matches in Regex101 for that with test strings of

"file.txt" "file 1.doc" "myfile.xlsx"

and

file.txt file 1.doc myfile.xlsx

RFBomb commented 2 years ago

Yea, regex101 for a mixed bag with that one. I’ll play around later

On Tue, Jan 18, 2022 at 9:32 AM Darren Rose @.***> wrote:

Try this regex

"\s(?\".\"|.+?)(?:\s*)"

It might be closer. | acts as an 'or' so it's searching for any characters wrapped within a set of quotes OR any characters separated by spaces.

Unfortunately not getting any matches in Regex101 for that with test strings of

"file.txt" "file 1.doc" "myfile.xlsx"

and

file.txt file 1.doc myfile.xlsx

— Reply to this email directly, view it on GitHub https://github.com/tjscience/RoboSharp/issues/121#issuecomment-1015469307, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE34HF7ADDU63Y4VYDYXATDUWV2X3ANCNFSM5JZXVJ7Q . You are receiving this because you were mentioned.Message ID: @.***>

RFBomb commented 2 years ago

This regex should work:

(?<VALUE>\\".+?\\"|^\\s*.+?\\s+?|^\\s*.+?\\s*?$|[^\\s].+?$)

image

PCAssistSoftware commented 2 years ago

not working for me?

image

RFBomb commented 2 years ago

Sorry, o pasted it formatted for c# (double backlash)

Here's the copy paste from regex101:


(?<VALUE>\".+?\"|^\s*.+?\s+?|^\s*.+?\s*?$|[^\s].+?\s+?|[^\s].+?$)
PCAssistSoftware commented 2 years ago

ignore, for some reason your RegEx was showing with double slashes - removed them and now working

(?<VALUE>\".+?\"|^\s*.+?\s+?|^\s*.+?\s*?$|[^\s].+?$)

But still not working for all potential entries - see below

Think we definitely need to enforce all entries being encasesd in quotes to make this more reliable / easier?

image

Should be three

*.txt file.doc myfile.doc

but only getting two as combining second one e.g. file.doc "myfile.doc"

RFBomb commented 2 years ago

[^]excludes characters from capture, so [^\s].* captures all non-white space characters.

Update to avoid capturing quotes too: [^\s\"]

PCAssistSoftware commented 2 years ago

Thanks

(?<VALUE>\".+?\"|^\s*.+?\s+?|^\s*.+?\s*?$|[^\s\"].+?\s+?|[^\s].+?$)

Seems to work well on the combinationsI have tested so far :)

RFBomb commented 2 years ago

I'll still take a look at it tonight/tomorrow to see if it can be simplified, as it's currently fairly complicated regex.

PCAssistSoftware commented 2 years ago

Can I request that we also support having comma (,) as an option to split entries as well as a space

so that

file1.txt,file2.txt

would work just as well as

file1.txt file2.txt

So we finish up with rule of separate entries with a space or a comma, or enclose any paths or filenames containing spaces in quotes

RFBomb commented 2 years ago

Yea, that will be easy to build into the regex.

[\s*,?\s*]

Any number of spaces, optional comma, any number of spaces.

RFBomb commented 2 years ago

Even better, this would work well, super simplified.

(?<VALUE>\".+?\"|[^\s\,\"]+)

Just took a bit of playing around image

PCAssistSoftware commented 2 years ago

Excellent, that is much simpler and a quick test of pasting in all my previous test strings, works really well :) and covers what we wanted

RFBomb commented 2 years ago

Note, if they provide:

myfile, my file, "my file" The result is:

Myfile
My
File
"My file"

So anything with a space should still be quoted.

But, that's only applying if they use the setter that's marker as obsolete. So really at that point it's on them.

I believe the regex we've come up with should handle any existing programs that were properly formatted just fine.

PCAssistSoftware commented 2 years ago

That makes sense

PCAssistSoftware commented 2 years ago

Have added a tooltip to the ExcludeFiles and ExcludeDirectories text boxes advising best usage - hope you agree?

image

RFBomb commented 2 years ago

Yea that looks good.

PCAssistSoftware commented 2 years ago

Resolved by #139, #142 and #143