It's quite possible to do an efficient rolling CRC. Let me know if you're interested in help with that. I could help with the mathematical theory, and practical implementations.
For the purposes of a rolling CRC, it is sensible to avoid doing any initial/final XOR as often done in classic CRC algorithms. That would just add unnecessary complexity for use in a rolling CRC.
Classically, CRCs have been described as calculating the remainder of binary long-division. Another way of describing it is, polynomial reduction using a reducing polynomial. Then CRCs can be thought of as operations in a Galois field such as GF(2^32), where the CRC polynomial is the reducing polynomial of the Galois Field.
A CRC "permutes" in each byte cycle by being multiplied by 0x100 and then reduced by the polynomial (ie modulo the polynomial in the Galois field). So a CRC value permuted through n bytes can be thought of as being multiplied by pow(0x100, n) modulo the polynomial in the Galois field.
So when calculating a n-byte sliding window, the oldest byte can be "removed" from the CRC value by calculating the CRC of the byte "delayed" by n bytes.
A look-up table can be made of all 256 byte values "delayed" by n bytes, for an efficient "removal" of each oldest byte from the sliding window in a rolling CRC.
Because CRCs combine linearly, a 256-entry look-up table can be constructed by first calculating the CRC of the 8 1-bit values "delayed" by n bytes, then building each of the 256 entries by XORing the appropriate combination of the 8 values.
It's quite possible to do an efficient rolling CRC. Let me know if you're interested in help with that. I could help with the mathematical theory, and practical implementations.