Why extracted windows are slightly above 1 second?

First of all, thank you so much for this repository. I am doing some research in the speech domain, and this has been very helpful.

But, I have some doubts regarding the same.

1) Why extracted windows are slightly above 1 second and not exactly 1 second? 2) Can this 1 second be increased to more number of seconds? How will this affect the training?